Open guillaumecharbonnier opened 2 weeks ago
I have inspected https://github.com/Oshlack/Toblerone/blob/master/src/utils.rs and found the fasta header was not in the expected format. I have suffixed each header with a " gene=IKZF1" and the error disappeared.
However, I get no assigned counts with my tidx whereas I get some counts if I use the IKZF1.tidx file in the github repo.
==> test_sample_VS_my_IKZF1_tidx.csv <==
Gene Deletion Count Total GeneLength ReadLength ScaleFactor Proportion ScaledProportion
IKZF1 ENST00000331340.8_del2 0 1483 6255 0 0 0 NaN
IKZF1 ENST00000331340.8_del2_3 0 1483 6255 0 0 0 NaN
IKZF1 ENST00000331340.8_del2_3_4 0 1483 6255 0 0 0 NaN
IKZF1 ENST00000331340.8_del2_3_4_5 0 1483 6255 0 0 0 NaN
IKZF1 ENST00000331340.8_del2_3_4_5_6 0 1483 6255 0 0 0 NaN
IKZF1 ENST00000331340.8_del2_3_4_5_6_7 0 1483 6255 0 0 0 NaN
IKZF1 ENST00000331340.8_del3 0 1483 6255 0 0 0 NaN
IKZF1 ENST00000331340.8_del3_4 0 1483 6255 0 0 0 NaN
IKZF1 ENST00000331340.8_del3_4_5 0 1483 6255 0 0 0 NaN
IKZF1 ENST00000331340.8_del3_4_5_6 0 1483 6255 0 0 0 NaN
IKZF1 ENST00000331340.8_del3_4_5_6_7 0 1483 6255 0 0 0 NaN
IKZF1 ENST00000331340.8_del4 0 1483 6255 0 0 0 NaN
IKZF1 ENST00000331340.8_del4_5 0 1483 6255 0 0 0 NaN
IKZF1 ENST00000331340.8_del4_5_6 0 1483 6255 0 0 0 NaN
IKZF1 ENST00000331340.8_del4_5_6_7 0 1483 6255 0 0 0 NaN
IKZF1 ENST00000331340.8_del5 0 1483 6255 0 0 0 NaN
IKZF1 ENST00000331340.8_del5_6 0 1483 6255 0 0 0 NaN
IKZF1 ENST00000331340.8_del5_6_7 0 1483 6255 0 0 0 NaN
IKZF1 ENST00000331340.8_del6 0 1483 6255 0 0 0 NaN
IKZF1 ENST00000331340.8_del6_7 0 1483 6255 0 0 0 NaN
IKZF1 ENST00000331340.8_del7 0 1483 6255 0 0 0 NaN
==> test_sample_VS_Toblerone_IKZF1_tidx.csv <==
Gene,Deletion,Count,Total,GeneLength,ReadLength,ScaleFactor,Proportion,ScaledProportion
Gene Deletion Count Total GeneLength ReadLength ScaleFactor Proportion ScaledProportion
2 ENST00000331340.8_del2 0 1483 6255 0 0 0 NaN
2 ENST00000331340.8_del2_3 0 1483 6255 0 0 0 NaN
2 ENST00000331340.8_del2_3_4 0 1483 6255 0 0 0 NaN
2 ENST00000331340.8_del2_3_4_5 0 1483 6255 0 0 0 NaN
2 ENST00000331340.8_del2_3_4_5_6 0 1483 6255 0 0 0 NaN
2 ENST00000331340.8_del2_3_4_5_6_7 0 1483 6255 0 0 0 NaN
2 ENST00000331340.8_del3 0 1483 6255 0 0 0 NaN
2 ENST00000331340.8_del3_4 0 1483 6255 0 0 0 NaN
2 ENST00000331340.8_del3_4_5 0 1483 6255 0 0 0 NaN
2 ENST00000331340.8_del3_4_5_6 0 1483 6255 0 0 0 NaN
2 ENST00000331340.8_del3_4_5_6_7 0 1483 6255 0 0 0 NaN
2 ENST00000331340.8_del4 8 1483 6255 0 0 0.005394471 inf
2 ENST00000331340.8_del4_5 0 1483 6255 0 0 0 NaN
2 ENST00000331340.8_del4_5_6 1 1483 6255 0 0 0.000674309 inf
2 ENST00000331340.8_del4_5_6_7 0 1483 6255 0 0 0 NaN
2 ENST00000331340.8_del5 0 1483 6255 0 0 0 NaN
2 ENST00000331340.8_del5_6 0 1483 6255 0 0 0 NaN
2 ENST00000331340.8_del5_6_7 0 1483 6255 0 0 0 NaN
2 ENST00000331340.8_del6 7 1483 6255 0 0 0.004720162 inf
2 ENST00000331340.8_del6_7 0 1483 6255 0 0 0 NaN
2 ENST00000331340.8_del7 0 1483 6255 0 0 0 NaN
I am unable to get an index for any fasta I try. Here is below what I get if I try IKZF1 as a test:
I tried to use both v0.0.8 and 0.0.9 with the same issue. Here is attached the fasta I try to index.
toblerone_transcriptome_IKZF1.zip
Can you share a working fasta so I can understand what is wrong with mine?