RabadanLab / arcasHLA

Fast and accurate in silico inference of HLA genotypes from RNA-seq
GNU General Public License v3.0
121 stars 50 forks source link

KeyError "B" #95

Open qz3hub opened 2 years ago

qz3hub commented 2 years ago

While running customize --genes A,B,C --transcriptome chr6 --genotype my.sample.genotype.json, this error came up:

Traceback (most recent call last): File "/home/arcasHLA-master/scripts/quant.py", line 237, in allele_results[allele_id[:-1]]['allele' + allele_id[-1]] = allele KeyError: 'B'

Could you please help diagnose?

adeschen commented 1 year ago

Hi,

I have the same problem for some samples. It seems the quant.py script loads the information from the .p file created in the previous step. It uses the genes that are present in that file to pre-fill the allele_results dictionary. However, it appears that sometimes, not all genes are present in the .p file.

An easy fix that stops the code from crashing is to change this section in quant.py (line 249) from:

   for allele_id, allele in genotype.items():
    allele_results[allele_id[:-1]]['allele' + allele_id[-1]] = allele
    total_hla_count += counts[allele_id]

to:

   for allele_id, allele in genotype.items():
    if not allele_id[:-1] in allele_results:
        allele_results[allele_id[:-1]] = defaultdict(float)
    allele_results[allele_id[:-1]]['allele' + allele_id[-1]] = allele
    total_hla_count += counts[allele_id]

The script won't crash anymore. However, the missing gene won't have any value printed in the output files. At this point, I don't know if it should or not.

Hoping to earing back from Rabadan's lab.

Best, Astrid