Closed juancresc closed 5 years ago
Thanks Juan for pointing that out. Until your message I wasn't actually aware that there was a .csi type of bam index; I guess I've never encountered an assembly with very large contigs (according to this post on biostars the max contig size that .bai indices can handle is 2^29-1 bases).
I can't say for sure or not whether the changes you've made would have unintended consequences. I agree this option should be fixed and I'll try to get to that next time I do an update.
Can you update on this thread if you do test out the changes you made, and how they worked?
Thanks, Mike
Thank you for your answer. Seems like it did run as expected (at least no errors).
The only thing I'm noticing (and might be unrelated) is when I specify a locifile, the valids miRNAs found are quite different as those without. I would expect those miRNA found with locifile are contained in the results without locifile. What am I missing here?
Hi Juan, thanks for following up. Good to know when I implement a fix. It might not be until this summer because of my crazy schedule right now.
about your locifile miRNA question: The exact coordinates used by the locifile were different than those found in de-novo run, I'm guessing. ShortStack's MIRNA calling is quite dependent upon the exact coordinates it used for folding possible RNAs. This can cause false negatives when the de novo cluster-calling produces sub-optimal coordinates. I suspect this is the answer to your question.
On Mon, Feb 4, 2019 at 2:50 PM Juan Manuel Crescente < notifications@github.com> wrote:
Thank you for your answer. Seems like it did run as expected (at least no errors).
The only thing I'm noticing (and might be unrelated) is when I specify a locifile, the valids miRNAs found are quite different as those without. I would expect those miRNA found with locifile are contained in the results without locifile. What am I missing here?
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FMikeAxtell%2FShortStack%2Fissues%2F88%23issuecomment-460387552&data=02%7C01%7Cmja18%40psu.edu%7C5ccd754621a3491465ec08d68ada1113%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C636849066523264076&sdata=%2FzGLR8nD6TmBeqR3v7YhRg251rRpmvrEKl6C2Vjwo1A%3D&reserved=0, or mute the thread https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAGiXiVLEGr95O5n8uXEGB-g1ra9c8lB3ks5vKI8XgaJpZM4aeWUt&data=02%7C01%7Cmja18%40psu.edu%7C5ccd754621a3491465ec08d68ada1113%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C0%7C636849066523274085&sdata=9zijnUfP%2FauGky1m78U%2BF%2B4Sk%2F71KUZzvpPm2xiJOjM%3D&reserved=0 .
-- Michael J. Axtell, Ph.D. Professor of Biology The Pennsylvania State University http://sites.psu.edu/axtell http://plantsmallrnagenes.science.psu.edu
I'll keep testing, thanks!
I got this error while trying with a large genome (triticum aestivum).
What I did as a workaround was to add -c to samtools index command (line 2703)
system "samtools index -c $$options{'bamfile'}";
and change the expected index name (line 2697) by changing .bai to .csi at the end
$expected_idx = "$$options{'bamfile'}" . ".csi";
I'm not sure whether this will have other consequences, would be nice to have an option to use bai or csi.