Open benwbrum opened 6 years ago
TODO: add sample inputs and outputs for census records.
In addition to the above, a flag (or probability) that the region of interest contains ink would be invaluable as an output.
Hi benwbrum I saw this idealist from google summer of code. I am wondering that is there a standard format for the input files? Because once standardized and had enough samples, a model can be trained for producing those bounding box coordinates.
Hi benwbrum I am interested in this idea. I get the idea but it would we more helpful to clarify things exactly if you provide some input and output images. because now I think to solve the problem in a way something like Yolo object detection but here instead of object we have some text format.
Hi @benwbrum This idea seems really interesting! Is the list of formats available anywhere (for reference)? Depending on the amount and granularity of data available, different pipelines can be constructed to solve this problem. Also, it would be great if you could provide a sample output (the format, details) that are available for constructing such a model :D
Hi @benwbrum I did a similar project on bill receipt data where I used a Deep Learning model to identify the different fields of data and form a bounding box around them. I also used another model to perform OCR on the bounded text data. I believe the task can be accomplished given enough training data and fixed number of fields.
Hi @benwbrum!
Really Interesting Idea. I had a few doubts. Hoping you could clarify them. 1) Is the project research-oriented like finding the best model? Is the data currently hosted on some platform like kaggle? 2) Are there any baseline models for us to compare our results with? 3) Does it involve other functionalities like integration with other software or some nice GUI?
Thanks!
See sample data and description of the data at https://github.com/FreeUKGen/SummerOfCodeImages/issues/3
I agree this can be done very efficiently provided enough data using deep learning.I have also in the past used traditional computer vision to do such a task on a banks forms with only a couple of scanned images.So yeah,I am sure this is achievable.I wanted to know whom to contact for submitting my proposal wrt summer of code for this project.
@benwbrum @PatReynolds we completed this as part of the last GSoC. However, I wonder if we might want to develop a tool as a next-step based on that past project? To discuss.
Actually a separate task - which was partially worked on and we now need to review. @benwbrum to review.
Many online transcription tools require users to transcribe a single record from an image, with a direct linkage between the region of the image containing a record and the transcription form. The majority of tools accomplish this by asking humans to draw a rectangle around the record (the region of interest/ROI) on the image before transcription can start. Our volunteers would far prefer to avoid this step, as they find it a distraction from transcription, which they prefer to do in a mouse-free manner. If we had a tool which would take an image file (or URL to a file) and a parameter explaining the format of the records on the image (1861 Census Form, 1851 Census Form, etc) and would produce a list of bounding box coordinates for the record locations on that image, we could skip any drawing step and present the records directly to users.