Closed JamesSG2 closed 3 years ago
Looks good. Definitely an improvement over what it was before. The only issue I ran into is that I had to pip install pandas
to get line 35 output_type='data.frame'
to work. I would add that to the requirements.txt
file so that it is listed on the requirements page.
I also noticed that the low_confidence_rejection
method directly acts on pytesseract_list
rather than returning the corrected list separately. This works, but it might be better to return an updated list rather than acting on an existing list. I'm not really sure though, so I'll leave it up to you if you think we should change it.
Also worth playing around with - is there a way to run the regex correction I wrote before running your function? The more original text there is the more effective the regex will be. If that's not possible it seems great as is, but its just something I thought of.
I'll go ahead and merge wip_confidence_levels to dev now since it's been reviewed and I just made the suggested changes.