Closed loj closed 1 year ago
Thanks a lot for the excellent report that made it easy to spot the issue. There are two things that can be done here. The problem is indeed the .tgz
extension not being used to detect the archive type.
You can declare the archive type in the URL. The adjusted addurls
call that does this is:
datalad -f json ls-file-collection tarfile project.tgz --hash md5 | jq '. | select(.type == "file")' | jq --slurp . | datalad addurls --key 'et:MD5-s{size}--{hash-md5}' - "dl+archive:${archivekey}#path={item}&size={size}&atype=tar" '{item}'
(look for atype=
). The docs on this are at https://docs.datalad.org/projects/next/en/latest/generated/generated/datalad_next.types.archivist.html#syntax-of-dl-archives-locators
The following patch would make this unnecessary, and I think it is sensible to recognize .tgz
as a TAR archive.
diff --git a/datalad_next/types/archivist.py b/datalad_next/types/archivist.py
index 12e9b2b..3c1ab49 100644
--- a/datalad_next/types/archivist.py
+++ b/datalad_next/types/archivist.py
@@ -134,6 +134,8 @@ class ArchivistLocator:
atype = ArchiveType.zip
elif '.tar' in suf:
atype = ArchiveType.tar
+ elif '.tgz' in suf:
+ atype = ArchiveType.tar
return cls(
akey=akey,
I will propose a PR.
I'm working on building a dataset from
.tgz
archives using the replacement foradd-archive-content
demonstrated here in combination with the archivist special remote. The demo below works if the archive is a.tar.gz
extension but not with.tgz
. With.tgz
, I need to configure thearchivist.legacy-mode
for a successfuldatalad get
. Here's a quick demo:datalad wtf
``` # WTF ## configuration