FRED-2 / OptiType

Precision HLA typing from next-generation sequencing data
BSD 3-Clause "New" or "Revised" License
182 stars 74 forks source link

optitype halts #47

Closed JenniferShelton closed 7 years ago

JenniferShelton commented 7 years ago

Hi I was wondering if you have a fix for the following issue. I get this error:

0:00:14.95 Mapping 4.R1.fished.fastq to NUC reference...

0:00:22.52 Mapping 4.R2.fished.fastq to NUC reference...

0:00:31.25 Generating binary hit matrix.
0:00:31.26 Loading OptiType_RNA/2017_03_16_11_54_33/2017_03_16_11_54_33_1.bam started. Number of HLA reads loaded (updated every thousand):

0:00:31.26 0 reads loaded. Creating dataframe...
Traceback (most recent call last):
  File "optitype-1.0/OptiType/OptiTypePipeline.py", line 267, in <module>
    pos, read_details = ht.pysam_to_hdf(bam_paths[0])
  File "optitype-1.0/OptiType/hlatyper.py", line 230, in pysam_to_hdf
    pos_df = pd.DataFrame.from_items(hits.iteritems()).T
  File "python-2.7.10/lib/python2.7/site-packages/pandas/core/frame.py", line 1046, in from_items
    keys, values = lzip(*items)
ValueError: need more than 0 values to unpack

With the following command for two of five datasets.

python2.7 OptiTypePipeline.py \
--config optitype_config.txt \
-i 4.R1.fished.fastq \
4.R2.fished.fastq \
--rna \
-v \
-o ~/OptiType_RNA
messersc commented 7 years ago

Hi Jennifer,

the problem seems to be the absence of any usable reads, unfortunately.

If you look at the output of OptiType for e.g. issue #45, you observe:

0:03:48.31 Loading ../OptiType_output/2017_03_07_16_43_12/2017_03_07_16_43_12_1.bam started. Number of HLA reads loaded (updated every thousand):
1K...2K...3K...4K...5K...6K...7K...8K...9K...10K...11K...12K...13K...14K...
 0:04:18.66 14383 reads loaded. Creating dataframe...

For your input, we see

0:00:31.26 Loading OptiType_RNA/2017_03_16_11_54_33/2017_03_16_11_54_33_1.bam started. Number of HLA reads loaded (updated every thousand):

0:00:31.26 0 reads loaded. Creating dataframe...

OptiType should probably check for that and not crash.

I have seen this behavior before with DNA reads from whole-exome sequencing, where the capture step did not include the HLA loci for some older kits, leaving no reads for OptiType to work with. RNA should work, if the HLA genes are expressed. Most of them should be. Maybe something is wrong with the FASTQs? I have no good idea what might be the problem here.