FRED-2 / OptiType

Precision HLA typing from next-generation sequencing data
BSD 3-Clause "New" or "Revised" License
181 stars 74 forks source link

Error at determining minimal set of non-overshadowed alleles #52

Closed JaneMerlevede closed 7 years ago

JaneMerlevede commented 7 years ago

Hello,

I am using OptiType with Python 2.7.10. After installing some modules, I could run the analysis: python /OptiType/OptiTypePipeline.py -d -v -i $curDir/NeoEpitopePrediction/${Tumor}_1.fastq $curDir/NeoEpitopePrediction/${Tumor}_2.fastq -o $curDir/NeoEpitopePrediction/HLA/ until determining minimal set of non-overshadowed alleles step:

` 0:00:00.38 Mapping Sample_214310406_T-AL-O_1.fastq to GEN reference...

0:00:19.11 Mapping Sample_214310406_T-AL-O_2.fastq to GEN reference...

0:00:38.76 Generating binary hit matrix. 0:00:38.76 Loading alleles and read IDs from /data/Analysis/NeoEpitopePrediction/HLA/2017_04_25_11_59_59/2017_04_25_11_59_59_0.sam... 0:00:40.06 11179 alleles and 2016 reads found. 0:00:40.06 Initializing mapping matrix... 0:00:40.07 2016x11179 mapping matrix initialized. Populating 1077618 hits from SAM file... 10% completed 20% completed 30% completed 40% completed 50% completed 60% completed 70% completed 80% completed 90% completed 100% completed 0:03:35.02 1077618 elements filled. Matrix sparsity: 1 in 20.91 0:03:44.25 Loading alleles and read IDs from /data/Analysis/NeoEpitopePrediction/HLA/2017_04_25_11_59_59/2017_04_25_11_59_59_1.sam... 0:03:45.11 11179 alleles and 2177 reads found. 0:03:45.11 Initializing mapping matrix... 0:03:45.12 2177x11179 mapping matrix initialized. Populating 992781 hits from SAM file... 10% completed 20% completed 30% completed 40% completed 50% completed 60% completed 70% completed 80% completed 90% completed 100% completed 0:06:28.36 992781 elements filled. Matrix sparsity: 1 in 24.51

0:06:40.68 temporary pruning of identical rows and columns

0:06:40.71 Size of mtx with unique rows and columns: (312, 446) 0:06:40.71 determining minimal set of non-overshadowed alleles `

Could this problem be related with the solver? I tried both solvers cbc and glpk that I added to my $PATH.

Thank you in advance for your help

andras86 commented 7 years ago

Hi Jane,

That step don't involve ILP solving so it must be something else. I noticed that according to your output log OptiType was using an old fallback method which it only does if the pysam Python-module is not available. Can you try installing pysam and re-run it? That would tell us whether the issue is associated with the old method or your data.

JaneMerlevede commented 7 years ago

Hello Andras,

Thank you for your answer. I tried to re-install pysam but it seems fine: pip install --user pysam Requirement already satisfied: pysam in /cm/shared/bioinfo/python/2.7.10/lib/python2.7/site-packages

andras86 commented 7 years ago

That looks good. Does it perhaps fail during the import? Try it with a file containing a single line "import pysam" or call the interpreter and just type it in. You're using the current version of OptiType, right?

JaneMerlevede commented 7 years ago

Yes, I am using the current verion (1.0) Well, with the output sam or test file:

 python -c "import pysam" /data/Analysis/NeoEpitopePrediction/HLA/2017_04_25_09_52_01/2017_04_25_09_52_01_0.sam
python -c "import pysam" /data/Analysis/NeoEpitopePrediction/HLA/2017_04_25_09_52_01/test

nothing happens...

Could it be related to HDF5? I am checking in our cluster, I cannot find HDF5 and I have not set the variables in my bashrc. I doubt, my files are not compressed

JaneMerlevede commented 7 years ago

When running the command line outside a job, I get a more detailed error message:

...
0:09:01.07 Size of mtx with unique rows and columns: (312, 446)
0:09:01.07 determining minimal set of non-overshadowed alleles
Traceback (most recent call last):
  File "/home/Project/Immuno/OptiType/OptiTypePipeline.py", line 284, in <module>
    minimal_alleles = ht.prune_overshadowed_alleles(temp_pruned)
  File "/home/Project/Immuno/OptiType/hlatyper.py", line 399, in prune_overshadowed_alleles
    non_overshadowed = covariance.columns.diff(overshadowed)
AttributeError: 'Index' object has no attribute 'diff'

Don't know if it helps

JaneMerlevede commented 7 years ago

I have now hdf5 1.10.0-patch1 but there is still the same error. If you have any hint, it would help me a lot.

andras86 commented 7 years ago

Hi Jane,

While it shouldn't happen in any OptiType version, you're not running the current one. Dependencies have changed since, those could have broken something. Can you try using the latest OptiType checkout from github?

JaneMerlevede commented 7 years ago

Hello Andras, Sorry for the delayed answer. On our cluster, we have Version: 1.0 as indicated in README.md. We will reinstall it and I will keep you informed. Thank you