NDAR / nda-tools

Python package for interacting with NDA web services. Used to validate, submit, and download data to and from NDA.
MIT License
48 stars 22 forks source link

having full local paths in .csv for data files still requires some location to be entered #14

Open yarikoptic opened 5 years ago

yarikoptic commented 5 years ago

which makes little sense. I think analysis should be done in code and if paths are absolute, there should be no question asked. So I had to enter / directory as the location:

/data/R01/bids/Brain/wpc-7642/sub-719531/ses-01/func/sub-719531_ses-01_task-recognitionFood_acq-MB8_run-02_bold.nii.gz
/data/R01/bids/Brain/wpc-7642/sub-719531/ses-01/func/sub-719531_ses-01_task-rest_acq-MB8_run-01_bold.nii.gz
Press the "Enter" key to specify directory/directories OR an s3 location by entering -s3 <bucket name> to locate your associated files:/
Building Package

which later on failed anyways with

  File "/home/XXX/proj/nda-tools/NDATools/Submission.py", line 352, in batch_update_status
    list_data = self.generate_data_for_request(status)
  File "/home/XXX/proj/nda-tools/NDATools/Submission.py", line 132, in generate_data_for_request
    size = self.full_file_path[file][1]
KeyError: 'data/NDA/output_v2/sub-719218_ses-01_task-rest_acq-MB8_run-01_bold.metadata.zip'
obenshaindw commented 5 years ago

@yarikoptic thanks for the issue. We can consider adding first a try that looks for the files using the paths provided (make the assumption paths are absolute), then prompt user for help in locating the files.

We definitely want to preserve the feature that allows paths to be relative and not start at the root, as NDA will preserve these paths into the creation of S3 Objects and future downloads of the data. The inability for the tool to match the strings in the CSV file to file locations on the user's system (or S3 Object storage) is one of the more frequent HelpDesk tickets, and we definitely need to look at how to make this easier and/or better documented. @ericearl had similar frustrations in getting the tool to locate manifest files and the associated files therein.

Any suggestions for how to make this better are welcome, I think we will try to address all current open issues in our next sprint. Depending on what that looks like we may try to tackle some improvements to bids2nda as well.

yarikoptic commented 5 years ago

Well, in the simplest case it is just a matter of using os.path.isabs on the paths and making sure that leading / is not stripped away for them.

But related issue is that ATM that AFAIK the entirety of the full path is now used to establish the "paths into the creation of S3 Objects", so for a file with full path /home/login/sensitiveinfo/blah/datasetroot/subdir/file.nii.gz all of the components (home/login/sensitiveinfo/blah/datasetroot/subdir/file.nii.gz) will be reflected in the path on S3, correct? Ideally, home/login/sensitiveinfo/blah/datasetroot/ should not be a part of it since not really pertinent to the internal dataset hierarchy. A few ideas on possible ways to handle that:

obenshaindw commented 4 years ago

which makes little sense. I think analysis should be done in code and if paths are absolute, there should be no question asked. So I had to enter / directory as the location:

/data/R01/bids/Brain/wpc-7642/sub-719531/ses-01/func/sub-719531_ses-01_task-recognitionFood_acq-MB8_run-02_bold.nii.gz
/data/R01/bids/Brain/wpc-7642/sub-719531/ses-01/func/sub-719531_ses-01_task-rest_acq-MB8_run-01_bold.nii.gz
Press the "Enter" key to specify directory/directories OR an s3 location by entering -s3 <bucket name> to locate your associated files:/
Building Package

which later on failed anyways with

  File "/home/XXX/proj/nda-tools/NDATools/Submission.py", line 352, in batch_update_status
    list_data = self.generate_data_for_request(status)
  File "/home/XXX/proj/nda-tools/NDATools/Submission.py", line 132, in generate_data_for_request
    size = self.full_file_path[file][1]
KeyError: 'data/NDA/output_v2/sub-719218_ses-01_task-rest_acq-MB8_run-01_bold.metadata.zip'

@yarikoptic can you provide a copy of the file you started with and the arguments you provided? We are looking to make some improvements to how the script handles file paths and are having a little difficulty reproducing the exact issue. Thanks.

yarikoptic commented 4 years ago

@yarikoptic can you provide a copy of the file you started with and the arguments you provided?

eh heh, quite a bit of time has passed. We are still looking around but I am afraid it might be gone. So far got only the one with relative paths

We are looking to make some improvements to how the script handles file paths and are having a little difficulty reproducing the exact issue.

I am surprised now that I have not posted the version (commit) of the nda-tools we used, but judging from the date, if anything fixed it should be in the e9ba5b6ed1ca13a7fceee6f7ff2675e81b45f0a8..66c48a048152b7faab4dca1696d93d91f6181e72 range. But I do not spot anything which could be relevant. I guess for the next upload we will try to protocol the situation better.