digital-preservation / csv-validator

CSV Validation Tool and API (CSV Schema RI)
http://digital-preservation.github.io/csv-validator
Mozilla Public License 2.0
202 stars 54 forks source link

IntegrityCheck error when folder called 'content' not in top folder, and additional file or folder at same level. #266

Open paulyoung84 opened 3 years ago

paulyoung84 commented 3 years ago

Running over some collections it picked up some erroneous IntegrityCheck errors. After manually verifying that there wasn’t actually any errors in the downloads I think I have traced it down to some residual issues with having a folder called ‘content’ somewhere in the filepath which isn’t the top level folder.

This was a problem before and was fixed in https://github.com/digital-preservation/csv-validator/commit/300ef07d20646b8638399b8cd41cac6608383005 . It seems there are still problems when you have a folder called content and a file or another folder at the same level. The substitution path in the fix fails and it seems to cause all directly related files above and below the content folder to fail integrityCheck. I’ve attached a small sample set which replicates this. csvvaltest.zip

adamretter commented 3 years ago

@paulyoung84 The Integrity Check was always a mixed-concern :-/ It shouldn't really have been put into the CSV Validator, rather it should have been a separate tool entirely.

I think the first thing would be to define in writing somewhere what an "Integrity Check" really means, this might mean working backwards from the code. From there we could then figure out if that's what TNA needs today? Ultimately, I would still suggest moving it into a separate tool though.