Closed bertsky closed 9 months ago
Oh, and I should mention that it would make perfect sense to coordinate this with the upcoming OCR-D feature that multiple FLocat refs are allowed per file. This will enable keeping the original remote presentation links in addition to downloaded local paths (with sane file names), so after processing the temporary local refs can be converted back to public and removed.
So if LAREX supports URL refs, it should also already support ignoring such remote refs if local refs are additionally present.
So if LAREX supports URL refs, it should also already support ignoring such remote refs if local refs are additionally present.
Ok, judging by the code, this should currently work already (i.e. http FLocats will be ignored).
I have also tested this successfully.
So nothing special needs to be done in LAREX after all – an external program could simply download all files of the required fileGrps and change them to local refs with sane file names (as ocrd workspace find --download
does, but keeping the remote refs, as with mm-update).
What remains to be done is instruct users how to do so. (Currently, they'll simply be surprised to get an empty fileGrp list if everything is remote URLs.)
Should we leave this open as a documentation issue?
Alas, it does not work on dev
anymore!
If a file has a secondary remote FLocat, then it will not show up as page in the editor. (Despite the fact that it was activated in the library dialog.) So if all files are formatted this way, then no pages are shown.
My guess is that this change is responsible.
My guess is that this change is responsible.
Argh that's annoying and shouldn't have happened. We'll try to find some time to look into this issue (and some of the others like #240 ) in the following days / weeks.
Any news on this? I'd really like to switch to the newest dev
version because of the other fixes, but this breaking change is a show-stopper for me.
Still on our backlog (I promise), sadly still didn't get to it yet. Will update as soon as we find some time (hopefully sooner than later).
@maxnth Have you considered to use a dedicated component for METS-handling, like mets-model?
My guess is that this change is responsible.
Finally got to looking into it, this indeed messed with loading annotations in METS projects. Starting from 70be72 this now works for me again (and I cautiously hope for other's as well, otherwise I'll look into it again) while also allowing loading annotations from files with certain special characters in the file name (which the "fix" above was intended to solve).
I'm gonna mark this as fixed, in case I missed something feel free to reopen this issue.
Since https://github.com/OCR4all/LAREX/commit/453ff15be0af23b3eb78823cd3e14efb438f5135, if the METS contains file references which are true URLs (not local file paths), then the library will not crash, but the respective book will be empty (and I cannot seem to leave such an empty book afterwards, I'll have to reload the page entirely).
It would be really helpful if LAREX was able to manage such remote files in a semi-transparent manner: