Breta01 / handwriting-ocr

OCR software for recognition of handwritten text
MIT License
750 stars 240 forks source link

Data loading issues #142

Open nandita0401 opened 3 years ago

nandita0401 commented 3 years ago

Screenshot (717)

OverflowError: Python int too large to convert to C long Getting this error for both train.csv and dev.csv file What to do to solve this error?

Breta01 commented 3 years ago

Can you try to replace the line csv.field_size_limit(sys.maxsize) in file src\ocr\datahelpres.py with following code:

max_int  = sys.maxsize

while True:
    # decrease the maxInt value by factor 10 
    # as long as the OverflowError occurs.
    try:
        csv.field_size_limit(max_int)
        break
    except OverflowError:
        max_int = int(max_int)

It seems that sys.maxsize behaves differently across platforms and can cause this error. You can also try to replace sys.maxsize with fixed number (e.g. csv.field_size_limit(2147483647), but I am not sure how big the number must be). If the number is too small it will result in error in further loading. Please try it and let me know how it goes.

nandita0401 commented 3 years ago

It is taking too much time for execution. It's been more than 12 hours for execution. Is there any solution?

Breta01 commented 3 years ago

Oh, that definitely shouldn't take that long (just a few seconds I guess). Did you try setting fixed number like csv.field_size_limit(2147483647)?

nandita0401 commented 3 years ago

Screenshot (795) From where to get the dataset?

Breta01 commented 3 years ago

Well, the steps are bit old and I would like to rework it once I have more time. You have to download datasets according to the instructions in data/ folder (all datasets aren't necessary). Then go to src/data/ and run these scripts in following order (some extra parameters might be necessary):

  1. python data_extractor.py
  2. python data_normalization.py
  3. python data_create_sets.py --csv
nandita0401 commented 3 years ago

Screenshot (800)

How to solve this error?

Breta01 commented 3 years ago

It should work now, just pull latest changes from repo.

nandita0401 commented 3 years ago

Can you please elaborate?

Breta01 commented 3 years ago

On what exactly?

nandita0401 commented 3 years ago

image How to solve this error?

user8746 commented 1 year ago

Can you try to replace the line csv.field_size_limit(sys.maxsize) in file src\ocr\datahelpres.py with following code:

max_int  = sys.maxsize

while True:
    # decrease the maxInt value by factor 10 
    # as long as the OverflowError occurs.
    try:
        csv.field_size_limit(max_int)
        break
    except OverflowError:
        max_int = int(max_int)

It seems that sys.maxsize behaves differently across platforms and can cause this error. You can also try to replace sys.maxsize with fixed number (e.g. csv.field_size_limit(2147483647), but I am not sure how big the number must be). If the number is too small it will result in error in further loading. Please try it and let me know how it goes.

user8746 commented 1 year ago

The error hasnt solved for me too even after replacing the code provided by you