YaleDHLab / ensemble-at-yale

Crowdsourcing the transcription of Yale playbills - http://bit.ly/ensemble-at-yale
http://ensemble.yale.edu
MIT License
6 stars 4 forks source link

Ingest re-scans #130

Closed pleonard212 closed 7 years ago

pleonard212 commented 7 years ago

Hi all! We now have updated files and CSV's for each era that reflect the re-scans. Two parts to this:

1) CSVs are here:

https://github.com/YaleDHLab/ensemble-at-yale/tree/master/project/ensemble-at-yale/subjects

(The most recent file, bundy, includes both the PDF pages and the scans. No changes have been made to the PDF rows, but it's included for completeness of the era.)

2) Full pages and thumbnails are gziped here:

smb://storage.yale.edu/home/FC_DigitalHumanities-807001-YUL/Ensemble/2017-04DataLoad-Rescans

(On a PC: \storage.yale.edu\home\FC_DigitalHumanities-807001-YUL\Ensemble\2017-04DataLoad-Rescans)

Finally, a note: 99% of the scans and rows are the same, but we should just dump all the old data and put in completely new stuff because there's no deterministic way of telling which images have been re-ordered, renamed, etc. This data load should completely replace all older subjects and subject sets on the production server.

duhaime commented 7 years ago

@pleonard212 I uploaded all image assets to S3, dropped tables, and reran the ingestion task last night. A quick look suggests that we're only missing the images from one playbill within the Wojewodski era: dra037-s01-b012-f136-i03-p0001.jpg

Do you know if we might be able to access those images?

pleonard212 commented 7 years ago

Great! I've uploaded the missing file and thumbnail to:

smb://storage.yale.edu/home/FC_DigitalHumanities-807001-YUL/Ensemble/2017-04DataLoad-Rescans/page-thumbs/dra037-s01-b012-f136-i03-p001.jpg

and

smb://storage.yale.edu/home/FC_DigitalHumanities-807001-YUL/Ensemble/2017-04DataLoad-Rescans/page-images/dra037-s01-b012-f136-i03-p001.jpg

pleonard212 commented 7 years ago

Oh actually is it all pages in that play that are missing? Will update with the full set.

pleonard212 commented 7 years ago

OK, that entire play is now on the server, ready to be added to s3.

duhaime commented 7 years ago

Thank you @pleonard212! We should be all set, so I'll close this one. Thanks again!

pleonard212 commented 7 years ago

Great! Not sure if the system is still processing stuff, but looks like http://ensemble.yale.edu/ may need to be poked a bit...

duhaime commented 7 years ago

It looks like the rake task completed without error. If you clear your cache and re-request the page, does everything look good?

pleonard212 commented 7 years ago

Ah, was indeed a cache issue..

duhaime commented 7 years ago

Excellent!