Closed milicmil closed 2 years ago
@milicmil Can you share the exact error and the exact command line call?
Hi,
for more context this is the command I ran (in the folder where download.py was located)
python download.py -iderivatives_func_motion_task-MID2.csv -dmilos_subset.txt -o/external/rprshnas01/external_data/abcd/ABCD_BIDS/functional_task
milos_subset.txt contained only "derivatives.func.motion_task-MID" from data_subsets.txt found in the github repo for the downloader. derivatives_func_motion_task-MID2.csv contains the following columns submission_id | dataset_id | submission_id.1 | manifest_name | manifest_file_name | associated_file
They are found in datastructure_manifest.txt that was downloaded from NDA for study 3165 as described in the readme.md in nda-abcd-s3-downloader
the rows in derivatives_func_motion_task-MID2.csv had only rows that had "derivatives.func.motion_task-MID" as found in the "manifest_file_name" column from *datastructure_manifest.txt.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "download.py", line 386, in <module>
_cli()
File "download.py", line 119, in _cli
subject_list = get_subject_list(manifest_df, args.subject_list_file)
File "download.py", line 153, in get_subject_list
for manifest_name in manifest_df['manifest_name'].values:
File "/opt/scc/conda/software/Python/3.8.5-Anaconda3-2021.03/lib/python3.8/site-packages/pandas/core/frame.py", line 3024, in __getitem__
indexer = self.columns.get_loc(key)
File "/opt/scc/conda/software/Python/3.8.5-Anaconda3-2021.03/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3082, in get_loc
raise KeyError(key) from err
KeyError: 'manifest_name'
@milicmil That repo uses a python argument parser which expects a space after each argument. So more like this:
python download.py -i derivatives_func_motion_task-MID2.csv -d milos_subset.txt -o /external/rprshnas01/external_data/abcd/ABCD_BIDS/functional_task
Thank you so much for the feedback. I just tried it and this is the error. I can check with folks at my center about that could be wrong and if I am loading the wrong python version.
I have 2 follow up questions just to make sure I am using the correct arguments:
Would you be able to write a super simple example of a command that will download something? Is my logic in terms of composing derivatives_func_motion_task-MID2.csv correct? Do I need to keep all of the columns from datastructure_manifest.txt when I am subsetting the columns for the data I am looking for?
Thank you very much for your time and help on this,
Milos Milic
Log folder: /nethome/kcni/mmilic
S3 Spreadsheet: derivatives_func_motion_task-MID2.csv
Subjects: All subjects
Traceback (most recent call last):
File "/opt/scc/conda/software/Python/3.8.5-Anaconda3-2021.03/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'manifest_name'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "download.py", line 386, in <module>
_cli()
File "download.py", line 119, in _cli
subject_list = get_subject_list(manifest_df, args.subject_list_file)
File "download.py", line 153, in get_subject_list
for manifest_name in manifest_df['manifest_name'].values:
File "/opt/scc/conda/software/Python/3.8.5-Anaconda3-2021.03/lib/python3.8/site-packages/pandas/core/frame.py", line 3024, in __getitem__
indexer = self.columns.get_loc(key)
File "/opt/scc/conda/software/Python/3.8.5-Anaconda3-2021.03/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3082, in get_loc
raise KeyError(key) from err
KeyError: 'manifest_name'
@milicmil If I am understanding correctly, you are setting up to download by making a subset of the datastructure_manifest.txt file. You do not want to do that. You should instead take a look at the https://github.com/ABCD-STUDY/nda-abcd-s3-downloader/blob/master/data_subsets.txt file and adjust it accordingly in your downloaded copy of the repository.
For instance, if you want just the MID task motion files, I would suggest you keep the following line in the data_subset.txt file and discard the other lines as you see fit:
https://github.com/ABCD-STUDY/nda-abcd-s3-downloader/blob/master/data_subsets.txt#L54
Does that help/make sense?
Hi,
It is currently set up like that. : milos_subset.txt contains only "derivatives.func.motion_task-MID" line. That is literally it.
Hi @milicmil - thanks for reaching out, and also thanks for your patience! Our lab is slammed right now. I've added some folks to this issue and we will hopefully be able to respond to it in the next few weeks after the holiday.
Hey everyone!
We met today to troubleshoot. There was some issues with formatting that we've resolved :) There's some documentation that could be better specified on the page as to how the datastructure_manifest.txt
file should be formatted. We should make some small changes to the nda-s3-downloader documentation to clarify :)
Dr. Feczko helped me out to resolve the issue.
Basically in the end derivatives_func_motion_task-MID2.csv was a csv file not a txt tab delimited file. download.py is looking for a tab delimited file for -i argument even though the help blurb states it wants a .csv file
help=("Path to the .csv file downloaded from the NDA containing s3 links "
"for all subjects and their derivatives.")
manifest_df in download.py wants a tab delimited file as indicated in line 118
manifest_df = read_csv(args.s3_file, sep='\t')
Hi,
I am trying to download a subsection of ABCD BIDS data from the collection for studies by our research centre in Toronto. We have ABCD access but I am confused by the structure of the data frame for -i command in download.py. It is not explained on the web site how to use it.
I know i need to get the links from the datasctructure_manifest.txt but when i subset the text file with the data I need, save it as csv with headers and ask download.py to use it as -i command, i keep getting key error for "manifest_name" even though it is in the data frame.
" -i S3_FILE, --input-s3 S3_FILE Path to the .csv file downloaded from the NDA containing s3 links for all subjects and their derivatives."
https://github.com/ABCD-STUDY/nda-abcd-s3-downloader