DaehwanKimLab / hisat-genotype

GNU General Public License v3.0
23 stars 15 forks source link

parse-results: misaligned columns in output csv #54

Open nikyk opened 3 years ago

nikyk commented 3 years ago

"parse-results" utility doesnt work as expected in some cases. When I open csv file in Excel/LibreOfficeCalc, I see that many records occupy wrong columns. Expected behavior: each column correspond to its own gene/pseudogene (HLA-A, DPB1, etc) E.g., I see DQA1 record in column named QDB1 and so on. Some lines for samples have more records, than others. For instance, the DPB2 gene exists in the output files for some samples, but is absent for the rest of the records. Starting from column for this gene, order of column is broken.

Proposed solution: 1) scan all source .report files for the first time, create complete list of genes present in .report files 2) use list of genes to name columns 3) scan report files for the second time, and fill the table. If some gene is absent, leave column blank.

tsoi2018_hla_L1.xlsx