Oshlack / Toblerone

MIT License
4 stars 1 forks source link

Unable to index fasta file #3

Open guillaumecharbonnier opened 2 weeks ago

guillaumecharbonnier commented 2 weeks ago

I am unable to get an index for any fasta I try. Here is below what I get if I try IKZF1 as a test:

out/wget/https/github.com/Oshlack/Toblerone/releases/download/v0.0.8/tinyt_amd64 index -i out/toblerone/build_index_fa-genome-hg38_IKZF1/toblerone_transcriptome.tidx out/toblerone/build_index_fa-genome-hg38_IKZF1/toblerone_transcriptome.fa
 2024-07-01T17:59:29.505 INFO  tinyt > Building index from fasta: out/toblerone/build_index_fa-genome-hg38_IKZF1/toblerone_transcriptome.fa
 2024-07-01T17:59:29.511 INFO  tinyt::utils > Reading transcripts from Fasta file
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', src/utils.rs:115:34
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I tried to use both v0.0.8 and 0.0.9 with the same issue. Here is attached the fasta I try to index.

toblerone_transcriptome_IKZF1.zip

Can you share a working fasta so I can understand what is wrong with mine?

guillaumecharbonnier commented 2 weeks ago

I have inspected https://github.com/Oshlack/Toblerone/blob/master/src/utils.rs and found the fasta header was not in the expected format. I have suffixed each header with a " gene=IKZF1" and the error disappeared.

However, I get no assigned counts with my tidx whereas I get some counts if I use the IKZF1.tidx file in the github repo.

==> test_sample_VS_my_IKZF1_tidx.csv <==
Gene    Deletion    Count   Total   GeneLength  ReadLength  ScaleFactor Proportion  ScaledProportion
IKZF1   ENST00000331340.8_del2  0   1483    6255    0   0   0   NaN
IKZF1   ENST00000331340.8_del2_3    0   1483    6255    0   0   0   NaN
IKZF1   ENST00000331340.8_del2_3_4  0   1483    6255    0   0   0   NaN
IKZF1   ENST00000331340.8_del2_3_4_5    0   1483    6255    0   0   0   NaN
IKZF1   ENST00000331340.8_del2_3_4_5_6  0   1483    6255    0   0   0   NaN
IKZF1   ENST00000331340.8_del2_3_4_5_6_7    0   1483    6255    0   0   0   NaN
IKZF1   ENST00000331340.8_del3  0   1483    6255    0   0   0   NaN
IKZF1   ENST00000331340.8_del3_4    0   1483    6255    0   0   0   NaN
IKZF1   ENST00000331340.8_del3_4_5  0   1483    6255    0   0   0   NaN
IKZF1   ENST00000331340.8_del3_4_5_6    0   1483    6255    0   0   0   NaN
IKZF1   ENST00000331340.8_del3_4_5_6_7  0   1483    6255    0   0   0   NaN
IKZF1   ENST00000331340.8_del4  0   1483    6255    0   0   0   NaN
IKZF1   ENST00000331340.8_del4_5    0   1483    6255    0   0   0   NaN
IKZF1   ENST00000331340.8_del4_5_6  0   1483    6255    0   0   0   NaN
IKZF1   ENST00000331340.8_del4_5_6_7    0   1483    6255    0   0   0   NaN
IKZF1   ENST00000331340.8_del5  0   1483    6255    0   0   0   NaN
IKZF1   ENST00000331340.8_del5_6    0   1483    6255    0   0   0   NaN
IKZF1   ENST00000331340.8_del5_6_7  0   1483    6255    0   0   0   NaN
IKZF1   ENST00000331340.8_del6  0   1483    6255    0   0   0   NaN
IKZF1   ENST00000331340.8_del6_7    0   1483    6255    0   0   0   NaN
IKZF1   ENST00000331340.8_del7  0   1483    6255    0   0   0   NaN

==> test_sample_VS_Toblerone_IKZF1_tidx.csv <==
Gene,Deletion,Count,Total,GeneLength,ReadLength,ScaleFactor,Proportion,ScaledProportion
Gene    Deletion    Count   Total   GeneLength  ReadLength  ScaleFactor Proportion  ScaledProportion
2   ENST00000331340.8_del2  0   1483    6255    0   0   0   NaN
2   ENST00000331340.8_del2_3    0   1483    6255    0   0   0   NaN
2   ENST00000331340.8_del2_3_4  0   1483    6255    0   0   0   NaN
2   ENST00000331340.8_del2_3_4_5    0   1483    6255    0   0   0   NaN
2   ENST00000331340.8_del2_3_4_5_6  0   1483    6255    0   0   0   NaN
2   ENST00000331340.8_del2_3_4_5_6_7    0   1483    6255    0   0   0   NaN
2   ENST00000331340.8_del3  0   1483    6255    0   0   0   NaN
2   ENST00000331340.8_del3_4    0   1483    6255    0   0   0   NaN
2   ENST00000331340.8_del3_4_5  0   1483    6255    0   0   0   NaN
2   ENST00000331340.8_del3_4_5_6    0   1483    6255    0   0   0   NaN
2   ENST00000331340.8_del3_4_5_6_7  0   1483    6255    0   0   0   NaN
2   ENST00000331340.8_del4  8   1483    6255    0   0   0.005394471 inf
2   ENST00000331340.8_del4_5    0   1483    6255    0   0   0   NaN
2   ENST00000331340.8_del4_5_6  1   1483    6255    0   0   0.000674309 inf
2   ENST00000331340.8_del4_5_6_7    0   1483    6255    0   0   0   NaN
2   ENST00000331340.8_del5  0   1483    6255    0   0   0   NaN
2   ENST00000331340.8_del5_6    0   1483    6255    0   0   0   NaN
2   ENST00000331340.8_del5_6_7  0   1483    6255    0   0   0   NaN
2   ENST00000331340.8_del6  7   1483    6255    0   0   0.004720162 inf
2   ENST00000331340.8_del6_7    0   1483    6255    0   0   0   NaN
2   ENST00000331340.8_del7  0   1483    6255    0   0   0   NaN