NDAR / nda-tools

Python package for interacting with NDA web services. Used to validate, submit, and download data to and from NDA.
MIT License
48 stars 21 forks source link

vtcmd infinite hang #68

Closed markjbaker closed 1 year ago

markjbaker commented 1 year ago

My lab is considering a switch to the command line tool for large submissions, so I'm validating a few datatypes as part of a pilot. I started with a .cram we use for testing, which I've verified is not corrupted and readable using pysam. Input/traceback are below:

% vtcmd testcram.cram
Running NDATools Version 0.2.25

Validating files...
  0%|                                                                                                                                                                         | 0/1 [00:00<?, ?it/s]Exception in thread Thread-2:
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/NDATools/Validation.py", line 484, in run
    data = file.read()
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x84 in position 27: invalid start byte
  0%|                                                                                                                                                                         | 0/1 [00:04<?, ?it/s]

After the initial UnicodeDecodeError, vtcmd set a second status bar at 0% and has been hanging for about 10 minutes. ps shows a status of S+, i.e. interruptible sleep. I feel the program should exit on failure, not enter sleep and hang in the terminal. Are CRAMs not supported by the validation tool?

gregmagdits commented 1 year ago

Since CRAM files are a type of file containing sequencing data, I suspect what you want to do is create a genomics_sample03 csv and populate the data_file1 column with the path to the CRAM files. When you run the nda-tools, you can submit both the csv and the cram files

vtcmd -b /path/to/genomics_sample03.csv -l /path/to/folder/with/cram_data

It looks like there are a couple of open Help Desk tickets related to this submission. The data curator will be reaching out shortly on those tickets with more details on what needs to be done to complete the submission