FRED-2 / OptiType

Precision HLA typing from next-generation sequencing data
BSD 3-Clause "New" or "Revised" License
188 stars 75 forks source link

AssertionError: Index length did not match values #4

Closed jgrundstad closed 9 years ago

jgrundstad commented 9 years ago

Hello!

I'm running into an error at the "Result dataframe has been constructed..." stage.

Here is he command I used: python ~/src/OptiType/OptiTypePipeline.py -v -i nebula_finished.fastq --dna -o . &> run_log.txt

My config.ini:

[MAPPING]
#please specify the razerS3 binary path
RAZERS3=/home/ubuntu/src/razers3-3.4.0-Linux-x86_64/bin/razers3
THREADS=8

[LIBRARIES]
RNA_REF=./data/hla_reference_rna.fasta
DNA_REF=./data/hla_reference_dna.fasta
ALLELES=./data/alleles.h5

[OPTIMIZATION]
#the solver has to be supported by Coopr
SOLVER=cbc
THREADS=1

And the run log:

0:00:02.66 Mapping nebula_finished.fastq to GEN reference...

0:05:23.24 Generating binary hit matrix.
0:05:23.24 Loading alleles and read IDs from ./2015_02_23_21_31_27/2015_02_23_21_31_27_0.sam...
0:05:27.01 11179 alleles and 13842 reads found.
0:05:27.01 Initializing mapping matrix...
0:05:27.02 13842x11179 mapping matrix initialized. Populating 4135583 hits from SAM file...
    10% completed
    20% completed
    30% completed
    40% completed
    50% completed
    60% completed
    70% completed
    80% completed
    90% completed
    100% completed
0:49:01.96 4135583 elements filled. Matrix sparsity: 1 in 37.42

0:50:19.10 temporary pruning of identical rows and columns

0:50:19.82 Size of mtx with unique rows and columns: (2163, 1384)
0:50:19.82 determining minimal set of non-overshadowed alleles

0:50:24.99 Keeping only the minimal number of required alleles (184,)

0:50:24.99 Creating compact model...

0:50:25.35 Initializing OptiType model...
WARNING: No construction rule or expression specified for constraint 'c'
Welcome to the CBC MILP Solver 
Version: 2.8.7 
Build Date: Dec 28 2013 

command line - /usr/bin/cbc -printingOptions all -import /tmp/tmp8qUjdt.pyomo.lp -import -stat=1 -solve -solu /tmp/tmp8qUjdt.pyomo.soln (default strategy 1)
Option for printingOptions changed from normal to all
Coin0009I  CoinLpIO::readLp(): Maximization problem reformulated as minimization
Current default (if $ as parameter) for import is /tmp/tmp8qUjdt.pyomo.lp
Presolve 2401 (-1) rows, 1379 (-1) columns and 19535 (-1) elements
Statistics for presolved model

Problem has 2401 rows, 1379 columns (1323 with objective) and 19535 elements
Column breakdown:
597 of type 0.0->inf, 1 of type 0.0->up, 0 of type lo->inf, 
0 of type lo->up, 0 of type free, 0 of type fixed, 
0 of type -inf->0.0, 0 of type -inf->up, 781 of type 0.0->1.0 
Row breakdown:
0 of type E 0.0, 0 of type E 1.0, 0 of type E -1.0, 
0 of type E other, 0 of type G 0.0, 6 of type G 1.0, 
0 of type G other, 1791 of type L 0.0, 0 of type L 1.0, 
604 of type L other, 0 of type Range 0.0->1.0, 0 of type Range other, 
0 of type Free 
Continuous objective value is -6923.74 - 0.04 seconds
Cgl0004I processed model has 2395 rows, 1379 columns (781 integer) and 19351 elements
Cbc0038I Solution found of -6923.74
Cbc0038I Before mini branch and bound, 781 integers at bound fixed and 26 continuous
Cbc0038I Mini branch and bound did not improve solution (0.08 seconds)
Cbc0038I After 0.08 seconds - Feasibility pump exiting with objective of -6923.74 - took 0.01 seconds
Cbc0012I Integer solution of -6923.74 found by feasibility pump after 0 iterations and 0 nodes (0.08 seconds)
Cbc0001I Search completed - best objective -6923.739999999943, took 0 iterations and 0 nodes (0.09 seconds)
Cbc0035I Maximum depth 0, 0 variables fixed on reduced cost
Cuts at root node changed objective from -6923.74 to -6923.74
Probing was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Gomory was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Knapsack was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Clique was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
MixedIntegerRounding2 was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
FlowCover was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
TwoMirCuts was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)

Result - Optimal solution found

Objective value:                -6923.74000000
Enumerated nodes:               0
Total iterations:               0
Time (CPU seconds):             0.11
Time (Wallclock seconds):       0.11

Total time (CPU seconds):       0.12   (Wallclock seconds):       0.12

0:50:26.55 Result dataframe has been constructed...
Traceback (most recent call last):
  File "/home/ubuntu/src/OptiType/OptiTypePipeline.py", line 325, in <module>
    coverage_mat = ht.calculate_coverage(plot_variables, features, hlatype, features_used)
  File "/home/ubuntu/src/OptiType/hlatyper.py", line 505, in calculate_coverage
    hit_counts[reads]):
  File "/home/ubuntu/env/optitype/local/lib/python2.7/site-packages/pandas/core/series.py", line 641, in __getitem__
    return self._get_with(key)
  File "/home/ubuntu/env/optitype/local/lib/python2.7/site-packages/pandas/core/series.py", line 688, in _get_with
    return self.reindex(key)
  File "/home/ubuntu/env/optitype/local/lib/python2.7/site-packages/pandas/core/series.py", line 2646, in reindex
    return self._reindex_with_indexers(new_index, indexer, copy=copy, fill_value=fill_value)
  File "/home/ubuntu/env/optitype/local/lib/python2.7/site-packages/pandas/core/series.py", line 2650, in _reindex_with_indexers
    return Series(new_values, index=index, name=self.name)
  File "/home/ubuntu/env/optitype/local/lib/python2.7/site-packages/pandas/core/series.py", line 492, in __new__
    subarr.index = index
  File "properties.pyx", line 74, in pandas.lib.SeriesIndex.__set__ (pandas/lib.c:29541)
AssertionError: Index length did not match values
andras86 commented 9 years ago

Hi Jason,

Did the typing itself run through? Given the point of failure you should already have a TSV file in the output directory with the predicted HLA genotype. Would it be possible for you to paste its contents here?

jgrundstad commented 9 years ago

Hello,

I do have the TSV:

    A1  A2  B1  B2  C1  C2  Reads   Objective
0   A*02:01 A*25:01 B*18:01 B*13:02 C*12:03 C*06:02 7250    6923.739999999979
andras86 commented 9 years ago

Well, that looks what it should look like. This is such a strange failure I'm afraid we can only look into it if you send over the SAM file to us for debugging. I'll get in touch with the details.

jgrundstad commented 9 years ago

Ok, Thank you. I appreicate the help! Jason