cern-sis / issues-scoap3

0 stars 0 forks source link

Find all missing files and fix them #294

Closed ErnestaP closed 4 months ago

ErnestaP commented 4 months ago
ErnestaP commented 4 months ago

All missing files: missing_files_glnzd.txt

ErnestaP commented 4 months ago

All files are from Elsevier. In total 1245

ErnestaP commented 4 months ago

final_missing_paths_dkpvp.txt

Missing files paths

ErnestaP commented 4 months ago

Paths of missing mapped with dois: mapping_missing_files_ozlvh.json

The newest files were selected, which means, that if there were 2 XMLs, the newest one was taken

ErnestaP commented 4 months ago

I attached almost all missing XMLs, PDFs and PDFAs in prod. We are missing 194 pdf/a files. I will try to check QA, maybe some of them can be found there

ErnestaP commented 4 months ago

I was not able to find pdfa's for the following records, neither in prod nor in qa, in total 188: missing_pdfas_dois.txt

ErnestaP commented 4 months ago

The missing PDFAs were in unextracted zips, in the download folder. It should not happen, that the record is only present in /data/harvesting/Elsevier/download folder, and is registered, but not extracted to data/harvesting/Elsevier/unpacked folder

pamfilos commented 4 months ago

closing as fixed