NDAR / nda-tools

Python package for interacting with NDA web services. Used to validate, submit, and download data to and from NDA.
MIT License
48 stars 21 forks source link

pandas.errors.ParserError: Error tokenizing data. #70

Closed GerardYu closed 11 months ago

GerardYu commented 1 year ago

hi there i got the following error when i was trying to download the preprocessed freesurfer output files from the HCP aging

(NDAenv) [junhong.yu@hpc-gekko1 HCP]$ downloadcmd -dp 1211941 -t fsdownloadlist.txt -d freesurfer
Running NDATools Version 0.2.25
Warning: Detected non-empty value for "password" in settings.cfg. Support for this setting has been deprecated and will no longer be used by this tool. Password storage is not recommended for security considerations
-u/--username argument not provided. Using default value of 'junhongyu' which was saved in /home/junhong.yu/.NDATools/settings.cfg

No value specified for --workerThreads. Using the default option of 31
Important - You can configure the thread count setting using the --workerThreads argument to maximize your download speed.

Getting Package Information...

Package-id: 1211941
Name: freesurfer
Has associated files?: Yes
Number of files in package: 266082
Total Package Size: 586.04GB

Downloading S3 links from text file: fsdownloadlist.txt
Traceback (most recent call last):
  File "/home/junhong.yu/NDAenv/bin/downloadcmd", line 8, in <module>
    sys.exit(main())
  File "/home/junhong.yu/NDAenv/lib/python3.7/site-packages/NDATools/clientscripts/downloadcmd.py", line 200, in main
    s3Download.start()
  File "/home/junhong.yu/NDAenv/lib/python3.7/site-packages/NDATools/Download.py", line 209, in start
    df = self.use_s3_links_file()
  File "/home/junhong.yu/NDAenv/lib/python3.7/site-packages/NDATools/Download.py", line 1019, in use_s3_links_file
    return self.query_files_by_s3_path(path_list)
  File "/home/junhong.yu/NDAenv/lib/python3.7/site-packages/NDATools/Download.py", line 1044, in query_files_by_s3_path
    df = pd.read_csv(self.metadata_file_path, header=0)
  File "/home/junhong.yu/NDAenv/lib/python3.7/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/home/junhong.yu/NDAenv/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 586, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/junhong.yu/NDAenv/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 488, in _read
    return parser.read(nrows)
  File "/home/junhong.yu/NDAenv/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 1047, in read
    index, columns, col_dict = self._engine.read(nrows)
  File "/home/junhong.yu/NDAenv/lib/python3.7/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 224, in read
    chunks = self._reader.read_low_memory(nrows)
  File "pandas/_libs/parsers.pyx", line 801, in pandas._libs.parsers.TextReader.read_low_memory
  File "pandas/_libs/parsers.pyx", line 857, in pandas._libs.parsers.TextReader._read_rows
  File "pandas/_libs/parsers.pyx", line 843, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas/_libs/parsers.pyx", line 1925, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: EOF inside string starting at row 41153
gregmagdits commented 1 year ago

It looks like the contents of the fsdownloadlist.txt is malformed, specifically at row 41153. Did you open the file to that location and if so, did you see anything abnormal?

GerardYu commented 1 year ago

there are less than 1000 rows in fsdownloadlist.txt. I've also tried to download another list of s3 links that was previously downloaded successfully in an older version of ndatools, and i got the same error

gregmagdits commented 1 year ago

Email NDAHelp@mail.nih.gov to open a help desk ticket and attach fsdownloadlist.txt to the email. This will open a ticket in our ticketing system. I suspect that the file is not formatted correctly. It should just contain a list of s3 urls

s3://gpop/ndar_data/QueryPackages/PRODDB/123456789101:ndar_username/Package_1215475/fmriresults01.txt s3://gpop/ndar_data/experiments/experiment_1530/block_1/Block_Design_File/5_minute_rest.txt s3://gpop/ndar_data/QueryPackages/PRODDB/123456789101:ndar_username/Package_1215475/package_info.txt

gregmagdits commented 11 months ago

User deleted the package_file_metadata file and reran the download to resolve the error.