Open ebeshero opened 8 years ago
@halperta Greetings Hannah! We've just been talking at the SHARP 2017 conference about your transcription tool using Ocular! @setriplette and @HelenaSabel, Hannah Alpert-Abrams at UT Austin says she can definitely help us auto-generate Montalvo transcriptions, and it helps a lot that we have some transcriptions done by hand to get started.
Hannah, here's the part of our repo with the photofacsimiles of Montalvo: https://github.com/ebeshero/Amadis-in-Translation/tree/master/book-images/Montalvo-1547
And, here are the TEI XML files containing our hand transcriptions. The ones with full <cl>
markup in them with @xml:id
s are the reliable files to work with...though there are several we've had students producing (mostly with very light XML markup) that have a bunch of errors we know we need to fix. Stacey can help orient! Thanks HUGELY for offering to help, and I hope our project can be useful to you!
Hello! I'm sending you some automatically produced transcriptions of the Montalvo document.
I played a little with the parameters but this is a pretty basic "dirty" transcription using Ocular (https://github.com/tberg12/ocular). The attached folder contains both xml (ALTO) and plain text transcriptions; I didn't sort them, but I'm sure you can separate them to make it easier to review. Take a look and let me know what you think! And let me know if you want to discuss further.
Hannah
Update: after sending this to you, I reviewed some sample docs for myself and was surprised by how inaccurate it is. I'm not sure if the parameters were off, or what. I'm going to look into it this week and get back to you :)
Thanks, @halperta ! We'll wait until you have another go at it...
Okay! I have some results for you. They are... inconsistent, but feel free to take a look at the sample pages on this website:
http://www.halperta.com/amadis/
I'm showing two kinds of automatically produced transcriptions. The "automatic transcriptions" show specific kinds of errors, as described int he text on the website. The "normalized" page corrects some of those errors automatically, making a more readable text... though it has its own problems, including a lack of page breaks. Anyway, I hope you'll be patient with the "dirtiness" of the OCR, and let me know if you see any way that these transcriptions could be useful to you.
Hannah
This is really interesting! The non-normalized one is actually closer to the text. It could help us transcribe, especially because it's appearing with an image beside and it's got some white space. Correcting that would probably be quicker than working from scratch.
Ooh, except there's a major problem. It's reading straight across the column break instead of top to bottom down the first column and top to bottom down the second column. Is there anything to do about that?
Hello, I'm so sorry for not replying to this, I'm new to this GitHub conversation thread and finding it very confusing! If you're still interested can we take the conversation over to email? I'm at halperta@gmail.com
Ocular can't handle columns, so we've been doing manual cropping. But don't be afraid, it's better than it sounds: we found a system that "stacks" pages so you can crop across all of the multicolumn pages at once. It's not the best, and can definitely be slow and unwieldy, but if you think it's worth trying I can show you how it's done.
Hannah
@halperta Good to know about Ocular, and yes, we are definitely interested in working with it. Now, if tagging you on the Issues board works, GitHub should send you an email message. I really need to keep all the project management discussion tidily on the Issues boards of our GitHub repo because my email is a wild jungle, but if you receive this in your inbox, perhaps we just need to remember to tag you as I did here. Please let me know if/when you receive this, Hannah, and thanks for the great input!
Okay, that should be fine. Can you confirm that replying from my inbox is working and you received this message in return?
Thrilled you want to work with ocular! Let me try to find the cropping instructions that Taylor Berg-Kirkpatrick painstakingly wrote up, and I'll send them to you. We used them on the first folio and it worked well.
On Jul 22, 2017 6:57 PM, "Elisa Beshero-Bondar" notifications@github.com wrote:
@halperta https://github.com/halperta Good to know about Ocular, and yes, we are definitely interested in working with it. Now, if tagging you on the Issues board works, GitHub should send you an email message. I really need to keep all the project management discussion tidily on the Issues boards of our GitHub repo because my email is a wild jungle, but if you receive this in your inbox, perhaps we just need to remember to tag you as I did here. Please let me know if/when you receive this, Hannah, and thanks for the great input!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ebeshero/Amadis-in-Translation/issues/62#issuecomment-317215781, or mute the thread https://github.com/notifications/unsubscribe-auth/ADxgN2bLMGmSkFLXwobUWp_Y_XGNajVmks5sQn5VgaJpZM4JDQ8X .
Okay, found the info!
We were using a program called imagej to do the cropping, which is a wysiwyg interface for image editing that you can get here https://imagej.nih.gov/ij/. As I recall, we followed these instructions:
https://tinyapps.org/blog/misc/201305030700_crop_images_like_briss.html
We used a program to help us sort our files. You can download the program here: manual_crop_demo.zip https://drive.google.com/file/d/0B8eDwPGQfEV9M3M0Q3NZbWRNQ1E/view?usp=drive_web Here is how we ran the program and cropped the files. I recommend breaking the book up into smaller subsets of pages, as I think you have already done. The program will sort into verso and recto images, and create folders for left and right columns --- all helpful in cropping. Then these steps walk you through the process:
Good luck! Hannah
On Sat, Jul 22, 2017 at 7:51 PM, Hannah Alpert-Abrams halperta@gmail.com wrote:
Okay, that should be fine. Can you confirm that replying from my inbox is working and you received this message in return?
Thrilled you want to work with ocular! Let me try to find the cropping instructions that Taylor Berg-Kirkpatrick painstakingly wrote up, and I'll send them to you. We used them on the first folio and it worked well.
On Jul 22, 2017 6:57 PM, "Elisa Beshero-Bondar" notifications@github.com wrote:
@halperta https://github.com/halperta Good to know about Ocular, and yes, we are definitely interested in working with it. Now, if tagging you on the Issues board works, GitHub should send you an email message. I really need to keep all the project management discussion tidily on the Issues boards of our GitHub repo because my email is a wild jungle, but if you receive this in your inbox, perhaps we just need to remember to tag you as I did here. Please let me know if/when you receive this, Hannah, and thanks for the great input!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ebeshero/Amadis-in-Translation/issues/62#issuecomment-317215781, or mute the thread https://github.com/notifications/unsubscribe-auth/ADxgN2bLMGmSkFLXwobUWp_Y_XGNajVmks5sQn5VgaJpZM4JDQ8X .
Thanks for these instructions, @halperta ! It sounds like we should be able to fine-tune the program to improve the results, and we can set to work on that in the coming months. Also, yes indeed, I am reading you loud and clear! I saw your replies in my email as well as here on our GitHub Issues thread.
Let's take a look at http://emop.tamu.edu/outcomes/Franken-Plus And let's collect other such potential leads here. @setriplette @HelenaSabel