bundesverfassung-oesterreich / bv-transkribus-import

workflow repo to fetch mets files from goobi and ingest them to transkribus
MIT License
0 stars 0 forks source link

failed imports #12

Closed cfhaak closed 11 months ago

cfhaak commented 1 year ago

In some cases the recent reimport (#10) failed. It seems the problem is caused by goobi not providing uniform files for some unknown reason. If images get referenced like that, the import fails:

<mets:div ID="PHYS_0020" ORDER="20" ORDERLABEL="19" TYPE="page">
    <mets:fptr FILEID="FILE_0020_PRESENTATION"/>
    <mets:fptr FILEID="FILE_0020_DEFAULT"/>
</mets:div>

Here it works.

<mets:div ID="PHYS_0004" ORDER="4" ORDERLABEL=" - " TYPE="page">
    <mets:fptr FILEID="FILE_0004"/>
    <mets:fptr FILEID="FILE_0004_PRESENTATION"/>
    <mets:fptr FILEID="FILE_0004_DEFAULT"/>
</mets:div>

Reason is the missing standard file-ptr in the mets. I probably could fix this, by just adding them to the file, since the corresponding files seem to exist. In some cases the files are even containing both kinds of links, faulty and working ones. Ich however will rather try to republish them in goobi.

cfhaak commented 1 year ago

On second thought: maybe fixing the mets on the fly isn't a bad idea if it works, since it isn't easy to decide, if an import worked without opening the document in the collection, due to the fact, that mets can be partially corrupted.

cfhaak commented 1 year ago
cfhaak commented 1 year ago

Set A: B-VG saubere Varaiante

Set B: Protokoll 7. Laenderkonferenz: bv_doc_id61 Protokoll 6. Laenderkonferenz: bv_doc_id60 Protokoll 4. Laenderkonferenz: bv_doc_id59 Protokoll 3. Laenderkonferenz: bv_doc_id58 Protokoll der 4. Sitzung des Subkomitees des Verfassungsausschusses vom 22. Juli 1920: bv_doc_id39 Protokoll 2. Laenderkonferenz 4. und 5. Jänner 1919 (partial): bv_doc_id57 Protokoll 1. Laenderkonferenz 23. November 1918 (partial): bv_doc_id__56

It's possible that there are more partially failed imports, but this is enough to test, if I can fix this in the fly by adding the links to the mets file.

cfhaak commented 1 year ago

I republished theses images to the viewer in goobi as suggested, and reimported them, but this didn't solve the issue.

cfhaak commented 1 year ago

Since one of the documents allready imported and transcribed was not cropped but the default upload (one image even with turned 180 degrees): @csae8092 are we supposed to use the defaults or the cropped ones for transcribing?

csae8092 commented 1 year ago

Ideally those Images which will be archived in arche

cfhaak commented 1 year ago

should import cropped images, need to investigate how import currently works with given packaged

cfhaak commented 1 year ago

this is the case, howerver some images are not cropped in goobi

cfhaak commented 1 year ago

it seems as if all issues are caused by goobi-workflows & need to be further investigated there

cfhaak commented 1 year ago

Set A:

Set B:

cfhaak commented 1 year ago

Still need to solve some of them, only way to do this is in goobi, since the processes there where messed up or failed due to the rather fragile systems. I will fix myself as long Jörg is on vacation, but on the long run, this should be handled by someone else.