Open Brutal15229 opened 5 days ago
when i am doing this File "/data/data/aditya_llama/Ocr_training/archive/PaddleOCR/ppocr/data/_init.py", line 107, in build_dataloader dataset = eval(module_name)(config, mode, logger, seed) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/data/aditya_llama/Ocr_training/archive/PaddleOCR/ppocr/data/simple_dataset.py", line 56, in _init self.data_lines = self.get_image_info_list(label_file_list, ratio_list) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/data/aditya_llama/Ocr_training/archive/PaddleOCR/ppocr/data/simple_dataset.py", line 73, in get_image_info_list lines = random.sample(lines, round(len(lines) * ratio_list[idx])) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: type str doesn't define round method
pls help me i stuck over here
It seems you are referencing a PaddleOCR-related issue where the user claimed to have resolved their problem with the following code:
label_file_list = ratio_list
The context of your question seems to suggest that you are encountering similar issues with PaddleOCR, particularly errors like KeyError: 'label'
or other data-related discrepancies during processing.
Error: KeyError: 'label'
Cause: This error typically occurs when the data loader or transformation pipeline is unable to find the label
key in the input data or annotation file. This might happen due to:
label
fields in the dataset JSON or CSV files.Solution:
label
key. For example:
{
"transcription": "example text",
"points": [[x1, y1], [x2, y2], [x3, y3], [x4, y4]],
"label": "class_name"
}
If the label
field is missing, you need to add it manually or modify your dataset generation script to include it.
KeepKeys
transform in the configuration includes label
if it's being used in the pipeline:
KeepKeys:
keep_keys: ['image', 'label', 'shape', ...]
VQATokenLabelEncode
or other encoding transforms to correctly map the label
field if required.Misalignment of label_file_list
and ratio_list
label_file_list
and ratio_list
are used together to load and balance datasets. If these lists are not aligned in length or content, errors can occur.label_file_list
in your configuration corresponds to the correct file paths of your dataset annotations.ratio_list
contains the same number of entries as label_file_list
and that the ratios are correctly set (e.g., [1.0]
for a single dataset or [0.7, 0.3]
for train/validation splits).Error in table/predict_table.py
predict_table.py
script and ensure that the input image paths and preprocessing steps are configured correctly.General Debugging Approach:
show_log=True
in the PaddleOCR configuration or script.Relevant Issues and Discussions:
Suggestions from the User's Post:
The snippet label_file_list = ratio_list
might imply that the user manually assigned the label_file_list
variable to resolve an alignment or configuration mismatch. Ensure that both variables are correctly defined in your configuration YAML file.
If you can provide more details about your specific issue (e.g., errors, configurations, or dataset structure), I can provide a more targeted solution!
Response generated by 🤖 feifei-bot | chatgpt-4o-latest
label_file_list= ratio_list
Originally posted by @jitesh-rathod in https://github.com/PaddlePaddle/PaddleOCR/issues/1996#issuecomment-811715798