brobertson / Lace2

In-broswer OCR editing program that transforms OCR results into structured, citable TEI. No XML experience required!
http://trylace.org
GNU General Public License v3.0
27 stars 2 forks source link

interlinear lines of accents? delete whole line? #149

Open lcerrato opened 1 year ago

lcerrato commented 1 year ago

Hi @brobertson I've got a local Lace package up and running, but I'm seeing some odd results of interlinear lines of just accents and breathing marks as if they are being split from the words. I don't think I can delete a whole line in the current version, is that correct? Right now I have to remove these one by one.

image
brobertson commented 1 year ago

Lisa, I think this is an OCR issue, not a Lace issue. That is, this OCR output looks bad enough that it's going to be pretty hard to edit anyhow. What did you use to produce it?

brobertson commented 1 year ago

Or, is this a bad patch in an otherwise good volume? In which case, yes, delete all the dirty lines (which might be someone's scribbling or something.)

lcerrato commented 1 year ago

Hi, I used the lacebuilder. It does seem to be a problem throughout the file. I changed the contrast on the scans once already to improve the result, so I was surprised by the misalignment of the accents and words. This was just an experiment for me to get back into the whole package creation and see what kind of results I could get to do some offline projects.

I just followed the default instructions here: https://github.com/brobertson/lacebuilder#example-including-archiveorg-files-and-tesseract-processing but perhaps there's some element I need to update.

Is there a shortcut to delete a whole line, or does it have to be done box by box?

As I said, I was just curious to see if I could get it up and running again.