FreeUKGen / New-Projects

All tasks relating to development of an Online Transcription Tool
0 stars 0 forks source link

Scope new project: Enhanced OCR correction for CRS #35

Open PatReynolds opened 7 years ago

PatReynolds commented 7 years ago

Outcomes: FR DAP shows availability CC on FR can allocate images to transcriber [CRS availability/allocation outcomes] Volunteer can view image, and mark up OCR for transcription on FR and/or CRS Checker can correct OCR text on FR and/or CRS FR can ingest corrected OCR text to index FR can export corrected OCR text to CRS and IA CRS can export corrected OCR text to IA

Source code and documentation is open for others to use.

needs #64

richpomfret commented 7 years ago

Ready to now move forward with this.

PatReynolds commented 7 years ago

https://standrewsrarebooks.wordpress.com/2017/05/12/practice-makes-perfect-new-tools-for-reading-old-handwriting/ - open source which we might use.

DeniseColbert commented 7 years ago

We have the go ahead to seek funding.

Ben to begin looking at tools already out there and move forward.

@PatReynolds to contact the Internet Archive

benwbrum commented 7 years ago

As OCR correction already has been pretty well served, I launched two test projects for the CRS's Miscellania volume IX on platforms that host OCR correction for us to evaluate. Both would need programming to extract register entries into our FreeREG search engine, but at least the basic correction tools are mature.

Here is the same page ingested into Wikisource and FromThePage. Both are online platforms running open-source software, so FreeUKGenealogy could install the software itself or partner with the service to run the OCR correction platform.

I'll do a quick demo of them both on tomorrow's call.

DeniseColbert commented 7 years ago

The 2 tools above are ready for us to play with and evaluate and check in with next meeting. Denise to update @PatReynolds with potential issues

benwbrum commented 7 years ago

See also the overall project progress pages for the same work on Wikisource and FromThePage: https://fromthepage.com/display/read_work?work_id=1437 and https://en.wikisource.org/w/index.php?title=Index:Miscellanea,_Volume_IX.djvu

PatReynolds commented 6 years ago

@Clare and @DeniseColbert to test: ocr correction interface - also ask Advisory Board. Both already integrated with IA. Will need work to extract and into FreeREG. Will need to experiment with how to mark-up genealogy information vs Catholic history.

DeniseColbert commented 6 years ago

Emailed @carobin589 16th Nov about testing in a hangout

PatReynolds commented 6 years ago

@benwbrum do we want two stages of microvolunteering, a mark-up stage where personal names and other genealogical info / sections for enhancement by the CRS are identified, and a transcription stage? Would such a mark up tool be integratable into Scribe and/or From the Page?

benwbrum commented 6 years ago

Initial testing determined that FromThePage is preferred over WikiSource.

richpomfret commented 6 years ago

@carobin589 can you provide an update for us?

PatReynolds commented 6 years ago

How will autolink be used? Can we add in autolinking of a person with an event (e.g. death) and a date?

benwbrum commented 6 years ago

The next step here is to transcribe a single page from the CRS test document in FromThePage, using the tagging feature to mark up dates, names, and places. If we can identify an appropriate page, I can take a hand at the transcription and see what's involved in extracting/converting/ingesting into FreeREG2.