Closed MaximilianStammnitz closed 4 years ago
Hello and thanks Max,
svimmer parses VCFs using a library called pysam and it has csi support version 0.14 https://github.com/pysam-developers/pysam/commit/a8304363b61723b8067df5e2d460c0db96dbb326
So I think I only need to add a few of lines of code in svimmer just to detect the presence of a csi index. I will make a pull request for it soon.
Best, Hannes
Many thanks for your quick reply, @hannespetur - looking forward to test this.
Later, I'd also be keen to genotype these SVs on >>512 Mb chromosomes via Graphtyper. Wondering if Graphtyper also strictly relies on .tbi? Guess the input fix to both would be quite a similar one, but I'm happy to open a separate issue down the road.
Best wishes, Max
You are welcome. csi indices work for me on the feature_csi
branch on a very small test, it would be great if you can checkout that branch and test it on your file.
Unfortunately, there is not csi index support in graphtyper. I will look into adding it but I think it will probably be a bit tougher to add since the library I am using for vcf reading doesn't have support for it. There is also a limitation in graphtyper that the total genome size cannot exceed 4 billion bp (genome position needs to fit in 32-bit integers) which is perhaps also a problem for your case. It is good to know there is interest for these features.
Best, Hannes
Hi Hannes,
Just checked your dev branch: svimmer's .csi support now also works smoothly for my examples, well done and thanks for the quick processing! 👍
... a bit unlucky with regard to the incompatible VCF library in graphtyper. I've just tested this: indeed, SVs are genotyped well for our chromosome sets - as long as none of the breakpoints reach into a segment >>512 Mb. The overall size of most marsupial genomes is still comparable to human and < 4 Gb, however they only have ~ 6-10 (very large) chromosome pairs. So .csi support would still be very helpful in this case.
While .csi can't be supported yet, do you have a best practice recommendation for SV genotyping besides graphtyper? (with original calls made by Manta)
Many thanks, Max
Okay, thanks for testing it and the info. No sorry, I don't have any particular recommendation.
Best, Hannes
No worries, hoping to get the full VCFs into Graphtyper via .csi support soon - genotyping results on SVs in the < 512Mb ranges look very promising on our end; I will open a separate issue for this.
Hi Hannes,
I am trying to merge SV calls from marsupial chromosomes. Some of these are >> 512 Mb in length, and hence need to be indexed via tabix -C -p vcf. This doesn't create a .tbi index, but a .csi one instead (a bit more explanation here).
However, svimmer currently relies on .tbi inputs only - could you possibly at .csi support? Happy to provide you with log files & tests if this helps.
Many thanks, Max