FRED-2 / OptiType

Precision HLA typing from next-generation sequencing data
BSD 3-Clause "New" or "Revised" License
180 stars 74 forks source link

unpaired reads, no mismatches, ambiguous #142

Closed QianRongAn closed 2 months ago

QianRongAn commented 10 months ago

I can get the results successfully, But the result is not ideal. This is result.tsv, I don't whether it is right.

    A1  A2  B1  B2  C1  C2  Reads   Objective
0   A*01:01 A*01:01                 0   0.0

And this is plot.pdf:

This is scRNA-seq data, I use razers3 first: razers3 -i 95 -m 1 -dr 0 -o fished_1.bam /mnt/bwa-0.7.17/optitype/OptiType-1.3.5/data/hla_reference_rna.fasta /data1/s/liver-cancer-GSA-HCC/HRR572980_S1_L001_R1_001.fastq.gz

razers3 -i 95 -m 1 -dr 0 -o fished_2.bam /mnt/bwa-0.7.17/optitype/OptiType-1.3.5/data/hla_reference_rna.fasta /data1/s/liver-cancer-GSA-HCC/HRR572980_S1_L001_R2_001.fastq.gz

Then, use samtools: samtools bam2fq fished_1.bam > sample_1_fished.fastq

Using optitype last: python OptiTypePipeline.py -i /mnt/bwa-0.7.17/optitype/OptiType-1.3.5/sample_1_fished.fastq /mnt/bwa-0.7.17/optitype/OptiType-1.3.5/sample_2_fished.fastq --rna -v -o /mnt/bwa-0.7.17/optitype/OptiType-1.3.5/output/

mapping with 16 threads...

0:00:01.17 Mapping sample_1_fished.fastq to NUC reference...

0:00:26.79 Mapping sample_2_fished.fastq to NUC reference...

0:01:19.13 Generating binary hit matrix. [E::idx_find_and_load] Could not retrieve index file for '/mnt/bwa-0.7.17/optitype/OptiType-1.3.5/output/2023_10_20_09_13_49/2023_10_20_09_13_49_1.bam' 0:01:19.19 Loading /mnt/bwa-0.7.17/optitype/OptiType-1.3.5/output/2023_10_20_09_13_49/2023_10_20_09_13_49_1.bam started. Number of HLA reads loaded (updated every thousand): 1K...2K...3K...4K... 0:01:33.37 4445 reads loaded. Creating dataframe... 0:05:49.75 Dataframes created. Shape: 4445 x 7339, hits: 6690942 (6710769), sparsity: 1 in 4.86 [E::idx_find_and_load] Could not retrieve index file for '/mnt/bwa-0.7.17/optitype/OptiType-1.3.5/output/2023_10_20_09_13_49/2023_10_20_09_13_49_2.bam' 0:05:49.85 Loading /mnt/bwa-0.7.17/optitype/OptiType-1.3.5/output/2023_10_20_09_13_49/2023_10_20_09_13_49_2.bam started. Number of HLA reads loaded (updated every thousand): 1K...2K...3K...4K...5K...6K...7K...8K...9K...10K...11K...12K...13K...14K...15K...16K...17K...18K...19K...20K...21K...22K...23K...24K...25K...26K...27K...28K...29K...30K...31K...32K...33K...34K...35K...36K...37K...38K... 0:06:01.87 38034 reads loaded. Creating dataframe... 0:40:44.93 Dataframes created. Shape: 38034 x 7339, hits: 7407290 (7407290), sparsity: 1 in 37.68 0:40:48.28 Alignment pairing completed. 0 paired, 42479 unpaired, 0 discordant

WARNING: Less than 10% of reads could be paired. Consider an appropriate unpaired_weight setting in your config file (currently 0.000), because you may need to resort to using unpaired reads.

0:40:49.04 temporary pruning of identical rows and columns

0:40:49.05 Size of mtx with unique rows and columns: (0, 1) 0:40:49.05 determining minimal set of non-overshadowed alleles

0:40:49.06 Keeping only the minimal number of required alleles (1,)

0:40:49.06 Creating compact model...

starting ilp solver with 1 threads...

0:40:49.07 Initializing OptiType model... WARNING: Initializing ordered Set R with a fundamentally unordered data source (type: set). This WILL potentially lead to nondeterministic behavior in Pyomo WARNING: DEPRECATED: The Model.preprocess() method is deprecated and no longer performs any actions (deprecated in 6.0) (called from /mnt/bwa-0.7.17/optitype/OptiType-1.3.5/model.py:147) GLPSOL--GLPK LP/MIP Solver 5.0 Parameter(s) specified in the command line: --write /tmp/tmptj7ujh5j.glpk.raw --wglp /tmp/tmprelcfi26.glpk.glp --cpxlp /tmp/tmpunkeoa1h.pyomo.lp Reading problem data from '/tmp/tmpunkeoa1h.pyomo.lp'... /tmp/tmpunkeoa1h.pyomo.lp:27: warning: lower bound of variable 'x4' redefined /tmp/tmpunkeoa1h.pyomo.lp:27: warning: upper bound of variable 'x4' redefined 3 rows, 3 columns, 4 non-zeros One variable is binary 28 lines were read Writing problem data to '/tmp/tmprelcfi26.glpk.glp'... 18 lines were written GLPK Integer Optimizer 5.0 3 rows, 3 columns, 4 non-zeros 1 integer variable, which is binary Preprocessing... Objective value = 0.000000000e+00 INTEGER OPTIMAL SOLUTION FOUND BY MIP PREPROCESSOR Time used: 0.0 secs Memory used: 0.0 Mb (39693 bytes) Writing MIP solution to '/tmp/tmptj7ujh5j.glpk.raw'... 15 lines were written

0:40:54.96 Result dataframe has been constructed...

b-schubert commented 2 months ago

there is something wrong with your read-pairing. Optitype cannot match the paired read and therefore constructs an an all-zero hit matrix. Either, you find your error when constructing the paired fastqs, or resort to single-end typing.