Closed y9c closed 2 years ago
Hello Chang Y,
Sorry about the bug. I just update the hisat-3n-code
. Could you pull and make it again? Hope this updated hisat-3n-table
works for you.
Best,
Leo
Thank you @imzhangyun.
Hi @imzhangyun,
This bug is still exist for some chromosomes, such as,
>snoRNA-URS000003BA79_10090 Mus musculus (house mouse) Z51 small nucleolar RNA
TGTACATGATGAAAACAGTCTCCCTCTTCTGAATCTCGCTGAGGAAACTGCATGTCACCCTCCTGAAAAC
>snoRNA-URS0000042F48_10090 Mus musculus (house mouse) partial derived from hnRNA or mRNA fragment, or novel small non-messenger RNA without known sequence-or structural motifs
AGCTACTCCCCACCACCAGCACCCAAAGCTGGTATTCTAATTAAACTACTTCTTGAGTACATAAATTTACATAGTACAACAGTACATTTATGTAACA
>snoRNA-URS00000672D4_10090 Mus musculus (house mouse) partial C/D box snoRNA; small non-messenger RNA (snmRNA)
AAAAAAAGGAAGTGCCGNCCGATGCGACAACTGACGACATCCCTAGTTAGCTGACT
Hello @y9c ,
I am sorry about this. Is that the same error as original that cannot find the last chromosome? Could you show me the exact error message generated by hisat-3n-table
? Also, could you tell me the length of the last chromosome and how many reads mapped to the last chromosome?
Hi @imzhangyun ,
The message is same as the previous one, Cannot find the chromosome: snoRNA-URS0000042F48_10090 in reference file..
This time, the sequence (snoRNA-URS0000042F48_10090 ) is not the last record of the file. It is in the middle of another two records. The exact sequence is show in the previous post.
Did you sort the input SAM/BAM file?
@y9c
I changed some codes in hisat-3n-table. Now it should be good. Please check the code on hisat-3n_TableChromNameFixing
branch. I will merge it tomorrow.
Many thanks for the quick response. I will test the bug fix branch and let you know.
Chang
On Thu, Feb 17, 2022, 16:59 Yun (Leo) Zhang @.***> wrote:
I changed some codes in hisat-3n-table. Now it should be good. Please check the code on hisat-3n_TableChromNameFixing branch. I will merge it tomorrow.
— Reply to this email directly, view it on GitHub https://github.com/DaehwanKimLab/hisat2/issues/311#issuecomment-1043582297, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJKEVXO32JNQWYGVLL4UGLU3V4T7ANCNFSM5A6BLIQQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you were mentioned.Message ID: @.***>
Hi Leo,
I figure out that chromosome is not parsed correctly is because not all the record names are separated by whitespace, some reference fasta use tab. So I think it would be better to change this line into:
size_t endPosition = inputLine.find_first_of(" \t");
Is it correct?
Chang
Hello @y9c,
Sorry again for the bug. I believe I solve the problem. Please pull the script from hisat-3n_TableChromNameFixing
branch.
Best. Leo
If code in the
hisat2-3n
branch can not parse reference fasta file correctly. The last record of the fasta file is not read.For example, if chr3 is the last record of the reference fasta file, and the chr3 is reported in the sam file, hisat2-3n stop with error message
Cannot find the chromosome: chr3 in reference file.
.https://github.com/DaehwanKimLab/hisat2/blob/f8b0dc34e304b1154622a9d9170cbcc8b6ea7db1/position_3n_table.h#L336