PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
https://paddlepaddle.github.io/PaddleOCR/
Apache License 2.0
44.72k stars 7.86k forks source link

i was facing same issue but solve..... #14287

Open Brutal15229 opened 5 days ago

Brutal15229 commented 5 days ago
          i was facing same issue but solve.....

label_file_list= ratio_list

Originally posted by @jitesh-rathod in https://github.com/PaddlePaddle/PaddleOCR/issues/1996#issuecomment-811715798

Brutal15229 commented 5 days ago

when i am doing this File "/data/data/aditya_llama/Ocr_training/archive/PaddleOCR/ppocr/data/_init.py", line 107, in build_dataloader dataset = eval(module_name)(config, mode, logger, seed) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/data/aditya_llama/Ocr_training/archive/PaddleOCR/ppocr/data/simple_dataset.py", line 56, in _init self.data_lines = self.get_image_info_list(label_file_list, ratio_list) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/data/aditya_llama/Ocr_training/archive/PaddleOCR/ppocr/data/simple_dataset.py", line 73, in get_image_info_list lines = random.sample(lines, round(len(lines) * ratio_list[idx])) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: type str doesn't define round method

Brutal15229 commented 5 days ago

pls help me i stuck over here

GreatV commented 4 days ago

It seems you are referencing a PaddleOCR-related issue where the user claimed to have resolved their problem with the following code:

label_file_list = ratio_list

The context of your question seems to suggest that you are encountering similar issues with PaddleOCR, particularly errors like KeyError: 'label' or other data-related discrepancies during processing.


Potential Causes and Solutions:

  1. Error: KeyError: 'label'

    • Cause: This error typically occurs when the data loader or transformation pipeline is unable to find the label key in the input data or annotation file. This might happen due to:

      • Improperly formatted dataset annotations.
      • Missing label fields in the dataset JSON or CSV files.
      • Incorrect transforms or keys being used in the configuration.
    • Solution:

      • Check your dataset annotation files to ensure that each entry contains a valid label key. For example:
        {
        "transcription": "example text",
        "points": [[x1, y1], [x2, y2], [x3, y3], [x4, y4]],
        "label": "class_name"
        }

        If the label field is missing, you need to add it manually or modify your dataset generation script to include it.

      • Ensure that your KeepKeys transform in the configuration includes label if it's being used in the pipeline:
        KeepKeys:
        keep_keys: ['image', 'label', 'shape', ...]
      • Update the VQATokenLabelEncode or other encoding transforms to correctly map the label field if required.
  2. Misalignment of label_file_list and ratio_list

    • Cause: During training, label_file_list and ratio_list are used together to load and balance datasets. If these lists are not aligned in length or content, errors can occur.
    • Solution:
      • Ensure that the label_file_list in your configuration corresponds to the correct file paths of your dataset annotations.
      • Verify that ratio_list contains the same number of entries as label_file_list and that the ratios are correctly set (e.g., [1.0] for a single dataset or [0.7, 0.3] for train/validation splits).
  3. Error in table/predict_table.py

    • Cause: If you are working with table recognition, errors can occur due to improperly structured input data or configuration mismatches.
    • Solution:
      • Ensure that the table annotation files are in the correct format required by PaddleOCR.
      • If the error occurs during prediction, check the predict_table.py script and ensure that the input image paths and preprocessing steps are configured correctly.
  4. General Debugging Approach:

    • Enable debug logs by setting show_log=True in the PaddleOCR configuration or script.
    • Run the script with a smaller subset of your dataset to isolate problematic entries.
    • Validate your data pipeline step by step (e.g., loading images, checking annotations, running transforms).

Key Resources:


If you can provide more details about your specific issue (e.g., errors, configurations, or dataset structure), I can provide a more targeted solution!


Response generated by 🤖 feifei-bot | chatgpt-4o-latest