Closed hannes-ucsc closed 5 years ago
This may be the underlying cause of #979.
I think it is currently an open question whether the temporary directory name matters or not to Terra. I just now downloaded and listed one of the BDBags from the UCSC Commons data browser, for which automated import into Terra is working, and its internal structure is as follows:
$ unzip -l 9c8a7496-a0c5-49f5-92a5-3fd748afaa80.zip
Archive: 9c8a7496-a0c5-49f5-92a5-3fd748afaa80.zip
Length Date Time Name
--------- ---------- ----- ----
204 05-14-2019 19:05 manifest/bag-info.txt
323 05-14-2019 19:05 manifest/tagmanifest-sha256.txt
579 05-14-2019 19:05 manifest/tagmanifest-sha512.txt
299 05-14-2019 19:05 manifest/manifest-sha512.txt
55 05-14-2019 19:05 manifest/bagit.txt
171 05-14-2019 19:05 manifest/manifest-sha256.txt
68 05-14-2019 19:05 manifest/data/participants.tsv
104236 05-14-2019 19:05 manifest/data/samples.tsv
--------- -------
105935 8 files
Yet, looking at the Broad's code, I am not sure the name of the top-level directory matters: https://github.com/broadinstitute/firecloud-orchestration/blob/6eb6c12c34f6bfc556161eb9d712f645f1d55e22/src/main/scala/org/broadinstitute/dsde/firecloud/EntityClient.scala#L65-L91
it looks to me like it just iterates over all the entries in the Zip file looking for one that's not a directory and the pathname contains /samples.tsv
.
I don't see any specific occurrences of manifests
, nor do I see why having a temp directory name instead would matter.
I am checking with @mjkrause to see if he has any additional information regarding this.
https://tools.ietf.org/html/rfc8493#section-2
The base directory can have any name, as illustrated by the figure below.
That of course doesn't mean that Terra doesn't depend on manifest
being the directory name.
I don't see any specific occurrences of
manifests
, nor do I see why having a temp directory name instead would matter.
Don't you want to search for singular, manifest
?
I still think we should have a deterministic name, even if the spec allows for any name.
Yes, I did search manifest
singular in the code, the plural manifests
was a typo in the comment above.
Reposing a message from @mjkrause:
I'm not aware of any convention as for the BDBag name with the Broad or Terra/FireCloud. It is true that we had named the folder explicitly
manifest
for the commons, and I think at that point I simple found it to be less confusing than the little-intuitive name of a temporary folder. In addition, as far as understand from glancing at the code manifest was inside of a temporary folder, so the commons temporary folder structure is one layer deeper than the HCA bag. But the file(s) in the payload not be affected by that.
I think a deterministic name for the top-level directory makes sense, and I think continuing use of the name manifest
is a good choice.
Demoed 5/28/19
┆Issue is synchronized with this Jira Story ┆Project Name: azul ┆Issue Number: AZUL-618