Conal-Tuohy / VMCP-upconversion

Ferdinand von Mueller's correspondence upconversion from MS Word to TEI XML
Apache License 2.0
3 stars 2 forks source link

Images #25

Open LucasHorseshoeBend opened 7 years ago

LucasHorseshoeBend commented 7 years ago

I have created a sample set of files with images.

Can you test the approach before I go through all the letters and extract image files? The test set has a range of features: embedded small images, drawings on seperate sheets; pure line drawings, and some that have graduated tones, including photographs.

The sample letters, and the images, are in the folder dropbox/Conal working files/copies of M letters with images. There are two folders inside that one: letter copies and images. The images have been saved with file names of the form yy-mm-dd_image0x.jpg. The letters have been saved with a file name yy-mm-dd-_status_IMAGETEST.doc.

The point in the transcription where an images appears has a footnote in various forms to suit the situation but including in all cases the name of the relevant image file, for example, 'For sketch, see yy-mm-dd_image01.jpg.'

There is one file with 10 images, hence the form of the file name; although I don't think it necessary to have them all with a leading zero, it is a useful way for me to be able to keep track. If it causes you problems I haven't foreseen, I can change the format by removing the leading zeros where less than 10 image files are associated with a text file.

I hope that this is sufficient to test your processes on a set small enough to be manageable, but large enough to be representative.

Conal-Tuohy commented 3 years ago

Just to confirm my understanding; these footnotes would be just placeholders for the images; so if any footnote contains text which appears to be an image file name (i.e. matching the pattern yy-mm-dd_image0x.jpg) then I should produce TEI output in which the footnote has been replaced with a TEI element which references the image file.

I have a suggested tweak which would be to include a description of the image in the footnote, along with the image file name. e.g. rather than 'For sketch, see yy-mm-dd_image01.jpg.' in the footnote text I would suggest excluding the filler words "For" and "see" and say simply Sketch of plant yy-mm-dd_image01.jpg (assuming for the sake of the example that it was a sketch of a plant). That would make it easy and reliable to detect the image file name (by pattern-matching), and the image description would be everything else in the footnote.

This would allow me to produce the following TEI:

<figure>
   <graphic url="yy-mm-dd_image01.jpg"/>
   <figDesc>Sketch of plant</figDesc>
</figure>

... and this could be converted to HTML which would allow vision-impaired readers who couldn't see the image itself to get access to the image description.

LucasHorseshoeBend commented 3 years ago

Thanks Conal I have put a test copy of a file in Quarantine folder: 83-12-00-test image note.doc

Best wishes Arthur

On 25 Feb 2021, at 05:37, Conal Tuohy notifications@github.com wrote:

Just to confirm my understanding; these footnotes would be just placeholders for the images; so if any footnote contains text which appears to be an image file name (i.e. matching the pattern yy-mm-dd_image0x.jpg) then I should produce TEI output in which the footnote has been replaced with a TEI element which references the image file.

I have a suggested tweak which would be to include a description of the image in the footnote, along with the image file name. e.g. rather than 'For sketch, see yy-mm-dd_image01.jpg.' in the footnote text I would suggest excluding the filler words "For" and "see" and say simply Sketch of plant yy-mm-dd_image01.jpg (assuming for the sake of the example that it was a sketch of a plant). That would make it easy and reliable to detect the image file name (by pattern-matching), and the image description would be everything else in the footnote.

This would allow me to produce the following TEI:

Sketch of plant

... and this could be converted to HTML which would allow vision-impaired readers who couldn't see the image itself to get access to the image description.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/Conal-Tuohy/VMCP-upconversion/issues/25#issuecomment-785626345, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF3IGTUULNAPM56YKJXIPXTTAXOZDANCNFSM4C3TEZFA.

LucasHorseshoeBend commented 1 year ago

I think this has now been solved. Close?