OCR-D / core

Collection of OCR-related python tools and wrappers from @OCR-D
https://ocr-d.de/core/
Apache License 2.0
117 stars 31 forks source link

workspace prune-files: also removes remote FLocats #1234

Open bertsky opened 1 month ago

bertsky commented 1 month ago

The documentation for the prune-files command says:

Removes mets:files that point to non-existing local files

However, that's not what is implemented: https://github.com/OCR-D/core/blob/26a3f787cd7746b8e197427e74c532edbecefd8b/src/ocrd/cli/workspace.py#L560-L577

So either the documentation should read…

Removes mets:files that do not point to existing local files

…or the implementation should not remove files that also have a .url.

Since we also have workspace find ... --undo-download (removing .local_filename in METS and filesystem for those entries which also have a .url – if implemented properly, i.e. fixing #1150) the former makes more sense to me, as the latter would be redundant behaviour.