LibraryOfCongress / bagit-python

Work with BagIt packages from Python.
http://libraryofcongress.github.io/bagit-python
216 stars 85 forks source link

fetch.txt validation fails with `file://<PATH>` lines #153

Closed avivace closed 5 months ago

avivace commented 2 years ago

BagIt validation fails with "Malformed URLs" when fetch.txt contains lines such as "file:///etc/hosts" while the BagIt specification mentions that fetch.txt should accept any URI according to RFC3986, which says that file paths when no hostname is specified should be written as e.g. file:///etc/host.

See also: https://en.wikipedia.org/wiki/File_URI_scheme

The problem is at https://github.com/LibraryOfCongress/bagit-python/blob/master/bagit.py#L775, since a parsing error is thrown whatever a URL yields no 'netloc', which is obviosuly set to an empty string when the pointed resource is in the local file system

Regardless, I do think that throwing a Parsing error whatever the actual parsing returned is wrong and should be changed

acdha commented 2 years ago

Good catch — if you can send a pull request it should be as simple as adding an exception for the file schema. Clearly we also need to get a test case for this, probably also in this repository as well since that's explicitly allowed by the spec.

https://github.com/LibraryOfCongress/bagit-conformance-suite/

avivace commented 2 years ago

Good catch — if you can send a pull request it should be as simple as adding an exception for the file schema. Clearly we also need to get a test case for this, probably also in this repository as well since that's explicitly allowed by the spec.

https://github.com/LibraryOfCongress/bagit-conformance-suite/

Yes! I can work on a PR next week :D

acdha commented 2 years ago

Good catch — if you can send a pull request it should be as simple as adding an exception for the file schema. Clearly we also need to get a test case for this, probably also in this repository as well since that's explicitly allowed by the spec. https://github.com/LibraryOfCongress/bagit-conformance-suite/

Yes! I can work on a PR next week :D

Thanks, much appreciated!

avivace commented 2 years ago

@acdha Did you have the chance to take a look at the PR?

avivace commented 1 year ago

@acdha do we have any news here?