denshoproject / ddr-local

Web UI used for interacting with DDR collections and entities on a local machine.
Other
3 stars 0 forks source link

File Import CSV Validation broken #328

Closed sarabeckman closed 10 months ago

sarabeckman commented 11 months ago

Dina tried to import a mezzanine file csv twice to ddr-densho-466. Each time she received a Succesful Validation. I volunteered to import the files via cmdln and found 2 errors in the CSV that the validation step should have caught (a missing file and a typo in the basename_orig field for one file).

I created a file import CSV for ddr-densho-477 and misspelled a file name in basename_orig on purpose. I tried the file import feature in the Editor UI and my csv also passed the validation step.

The Validation step should fail if the validator can't match all file names listed in basename_orig to a matching file in the folder the csv is located in. The validator should also check to make sure all object ids listed in the csv match an object id in the repository.

gjost commented 11 months ago

The original validation code was really convoluted so I simplified it where I could. There were some parts where validation errors were not passed back up to higher-level functions, which may explain why problems were not reported as errors. The file checking should now catch both missing files (a typo in basename_orig should trigger this) and unreadable files. I clarified some of the error messages.

As part of simplifying validation code I had to touch the ddr-local Celery task code. As part of my work I fixed a problem and now error tracebacks during CSV imports will be written to the upload logs.

We don't have validation of the contents of file fields. We implemented that for entities but for some reason not for files. Implementation should be in csvvalidate_FIELD functions in ddr-defs/repo_models/files.py

gjost commented 11 months ago

Tested on ishigura:

sara.beckman: Validation work as intended when a file in the import csv was not present. Validation worked as intended when the basename_orig didn't match the file present However, an update import -- the id field has the id plus the hash for the file in the csv. The Validation flags all the files as missing (since it is only a metadata update the files don't need to be present).

The Validation step also stops cmdln from importing file csv updates ddrimport file /media/qnfs/kinkura/working/ddrimporttest/ddr-densho-10-files_updated.csv /var/www/media/ddr/ddr-densho-10 The CSV is in /media/qnfs/kinkura/working/ddrimporttest

GeoffFroh commented 10 months ago

See #329 for issues regarding metadata-only update.

Still need to test (@sarabeckman):

GeoffFroh commented 10 months ago

Packages released!