Closed theathorn closed 4 years ago
Here is the script. It needs to be run with Python 3.6 or later. I've tested it and it seems to work fine.
#! /usr/bin/python3
import csv
import os
import sys
from pathlib import Path
from argparse import ArgumentParser
def compose_zarrs(file_path):
manifest = Path(file_path)
download_dir = manifest.parent.absolute()
with open(manifest) as f:
reader = csv.DictReader(f, delimiter='\t', dialect='excel-tab')
rows = list(reader)
dir_separators = ['/', '!']
for row in rows:
file_name = row['file_name']
file_path = download_dir / row['file_path']
seps = [sep for sep in dir_separators if sep in file_name]
if len(seps) == 0:
continue
elif len(seps) == 1:
sep = seps[0]
sub_path = Path(*file_name.split(sep))
full_path = download_dir / sub_path
full_path.parent.mkdir(parents=True, exist_ok=True)
try:
os.link(file_path, full_path)
except FileExistsError:
pass
else:
raise ValueError(f'File {file_name} has multiple separators: {seps}.')
def main(argv):
parser = ArgumentParser(
description="Parse the manifest that is rewritten by the CLI download to get "
"the downloaded files' paths. Use this to compose zarray stores "
"into their expected, nested directory format."
)
parser.add_argument('file_path', help='path to manifest file')
options = parser.parse_args(argv)
compose_zarrs(options.file_path)
if __name__ == '__main__':
main(sys.argv[1:])
@achave11 Could you review this? Basically just test the program and see if it works as expected. LMK if you need more context for this.
Sent to customer 3/4/20.
Customer says OK to close this ticket.
See https://humancellatlas.zendesk.com/agent/tickets/174.
After selecting matrix files, generating a manifest, and using the HCA CLI to download the zarr files for a project, the zarr files are stored in a flattened instead of hierarchical directory structure.
A script needs to be provided to correct this.