bio-guoda / preston

a biodiversity dataset tracker
MIT License
25 stars 1 forks source link

failed query for composite content iri ```[tar:gz:hash://sha256/bf18509ad6a2a97143d4f74e72dc4177ec31a4c50b3d7052f9a9cf6735f65e43!/50418.1.1.tar!/0050418/1.1/data/0-data/NODC_TaxonomicCode_V8_CD-ROM/TAXBRIEF.DAT]``` #267

Closed jhpoelen closed 8 months ago

jhpoelen commented 8 months ago

Preston uses Apache Virtual File System (VFS) file path notation to point into (compressed) archives. e.g.,

tar:https;//example.org/file.tar!/somefile.txt

references a file, somefile.txt inside a tar ball at https://example.org/file.tar .

for gzipped archives, VFS allows for both

tar:gz:https;//example.org/file.tar.zip!/file.tar!/somefile.txt

and

tar:gz:https;//example.org/file.tar.gz!/somefile.txt

But . . . when used in Preston for some reason,

tar:gz:hash://sha256/bf18509ad6a2a97143d4f74e72dc4177ec31a4c50b3d7052f9a9cf6735f65e43!/50418.1.1.tar!/0050418/1.1/data/0-data/NODC_TaxonomicCode_V8_CD-ROM/TAXBRIEF.DAT

does not resolve, even though the contentid and file path exists.

observed using:

Caused by: java.io.IOException: cannot find content identified by [<tar:gz:hash://sha256/bf18509ad6a2a97143d4f74e72dc4177ec31a4c50b3d7052f9a9cf6735f65e43!/50418.1.1.tar!/0050418/1.1/data/0-data/NODC_TaxonomicCode_V8_CD-ROM/TAXBRIEF.DAT>]
    at bio.guoda.preston.stream.ContentStreamFactory.create(ContentStreamFactory.java:74)
    at bio.guoda.preston.store.ContentHashDereferencer.get(ContentHashDereferencer.java:23)
    ... 34 more
jhpoelen commented 8 months ago

Root cause was a invalid use of a split method.

jhpoelen commented 8 months ago

resolved in preston v0.7.7