NDAR / nda-tools

Python package for interacting with NDA web services. Used to validate, submit, and download data to and from NDA.
MIT License
48 stars 21 forks source link

Trouble downloading file #55

Closed coreyjr2 closed 11 months ago

coreyjr2 commented 1 year ago

Hello,

I am trying download files from the NDA, and when using a fresh conda python environment with only nda-tools and secretstorage installed, along with their dependencies. When I attempt to use downloadcmd and a S3 link, it gives the following error message:

downloadcmd s3://NDAR_Central_1/submission_XXXXX/NDARXXXXXX_baselineYear1Arm1_ABCD-SST-fMRI_XXXXXXXXXX.tgz
Running NDATools Version 0.2.21
Warning: Detected non-empty value for "password" in settings.cfg. Support for this setting has been deprecated and will no longer be used by this tool. Password storage is not recommended for security considerations
-u/--username argument not provided. Using default value of 'XXXXX' which was saved in /home/XXXXX/.NDATools/settings.cfg
Enter your NIMH Data Archives password:

No value specified for --workerThreads. Using the default option of 71
Important - You can configure the thread count setting using the --workerThreads argument to maximize your download speed.

Getting Package Information...

An unexpected error was encountered and the program could not continue. Error message from service was: 
Failed to convert value of type 'java.lang.String' to required type 'java.lang.Long'; nested exception is java.lang.NumberFormatException: For input string: "None"

Exit signal received, shutting down...

Any assistance you may be able to provide in the manner would be greatly appreciated. Thank you!

gregmagdits commented 1 year ago

Hi Corey, The -dp argument from your command and is required for the program to work. The value of the -dp argument should be the package-id that contains the file you want to download. For example

downloadcmd -dp 1234567 s3://NDAR_Central_1/submission_XXXXX/NDARXXXXXX_baselineYear1Arm1_ABCD-SST-fMRI_XXXXXXXXXX.tgz

You can find the package-id on the package's dashboard on the website.

coreyjr2 commented 1 year ago

Thank you for the reply! I tried again using the the -dp flag, and while the output generated by this command seems to be working, the file does not appear downloaded to the directory that I am specifying.

downloadcmd -dp XXXXXXX -d '/path/to/file/' --file-regex 's3://NDAR_Central_1/submission_13124.0/NDARINVXXXXXXX_subject_image.tgz'
Running NDATools Version 0.2.21
-u/--username argument not provided. Using default value of 'XXXXXX' which was saved in /home/username/.NDATools/settings.cfg

No value specified for --workerThreads. Using the default option of 71
Important - You can configure the thread count setting using the --workerThreads argument to maximize your download speed.

Getting Package Information...

Package-id: XXXXXXX
Name: name_of_package
Has associated files?: No
Number of files in package: 63
Total Package Size: 797.26MB

Downloading files from package XXXXXXX matching regex s3://NDAR_Central_1/submission_13124.0/NDARINV_subject_image.tgz

S3 links for files that failed to download will be written out to /home/username/NDA/nda-tools/downloadcmd/logs/failed_s3_links_file_20221216T132727.txt. You can attempt to download these files later by running: 
    downloadcmd -dp XXXXXX --file-regex s3://NDAR_Central_1/submission_13124.0/NDARINVXXXXXXX_subject_image.tgz -u username -d /path/to/file -wt 71 -t "/home/username/NDA/nda-tools/downloadcmd/logs/failed_s3_links_file_20221216T132727.txt"

Beginning download of files from package matching s3://NDAR_Central_1/submission_13124.0/NDARINVXXXXXXX_subject_image.tgz using 71 threads
No failures detected. Removing file /home/username/NDA/nda-tools/downloadcmd/logs/failed_s3_links_file_20221216T132727.txt

Finished processing all download requests @ 2022-12-16 13:27:35.090241.
     Total download requests 0
     Total errors encountered: 0

 Exiting Program...

If you may have any insight into errors I may have made in the code, it would be greatly appreciated. Thank you kindly for your assistance!

gregmagdits commented 1 year ago

The path of the file looks incorrect. Specifically s3://NDAR_Central_1/submission_13124.0/ should probably be s3://NDAR_Central_1/submission_13124/

If you use the --file-regex argument, you shouldn't need to provide the absolute path to the file. Are you able to get the file you are looking for by running

downloadcmd -dp XXXXXXX -d '/path/to/file/' --file-regex NDARINVXXXXXXX_subject_image

?

coreyjr2 commented 1 year ago

Thank you! It did seem to be an issue with the path to S3, however when I run as suggested it doesn't seem to download:

downloadcmd -dp packageid -d 'path/to/file' --file-regex 'NDARINVXXXXXX_subject_image'
Running NDATools Version 0.2.21
-u/--username argument not provided. Using default value of 'username' which was saved in /home/username/.NDATools/settings.cfg

No value specified for --workerThreads. Using the default option of 71
Important - You can configure the thread count setting using the --workerThreads argument to maximize your download speed.

Getting Package Information...

Package-id: packageid
Name: package
Has associated files?: No
Number of files in package: 63
Total Package Size: 797.26MB

Downloading files from package packageid matching regex NDARINVXXXXXX_subject_image

S3 links for files that failed to download will be written out to /home/username/NDA/nda-tools/downloadcmd/logs/failed_s3_links_file_20221220T122745.txt. You can attempt to download these files later by running: 
    downloadcmd -dp 1207804 --file-regex NDARINVXXXXXX_subject_image -u username-d /path/to/file -wt 71 -t "/home/username/NDA/nda-tools/downloadcmd/logs/failed_s3_links_file_20221220T122745.txt"

Beginning download of files from package matching NDARINVXXXXXX_subject_image using 71 threads
No failures detected. Removing file /home/usernamej/NDA/nda-tools/downloadcmd/logs/failed_s3_links_file_20221220T122745.txt

Finished processing all download requests @ 2022-12-20 12:27:52.671870.
     Total download requests 0
     Total errors encountered: 0

 Exiting Program...

Thank you for your continued support and consideration! Sorry for the continued trouble

gregmagdits commented 1 year ago

Hi Corey, I dont see any files in that package that match the regex '_subject_image'. It looks like the option to 'include associated files' was not selected at the time of package creation, so the only files in your package are the data-structure files (the .txt files) and some meta-data files. If you want to download the imaging data, you will need to re-create the package and make sure you select the option to 'include associated files'. This option shows up on the same popup where you enter the name of the package.

Thanks