Open OmarOakheart opened 3 years ago
Absolutely... Any news here to cope for this?
Some features that should be added:
-Bam support, I try nPhase on a hexaploid plant (haploid genome ~650Mbp), nPhase inflates the data enormously, if this go on I will run out of disk:
-rw-rw-r-- 1 309G Jun 22 07:04 hexa.sam -rw-rw-r-- 1 302G Jun 22 08:23 hexa.pass.sam -rw-rw-r-- 1 302G Jun 22 10:53 hexa.sorted.header.sam -rw-rw-r-- 1 231G Jun 22 12:06 hexa.sorted.sam
Hi,
Those are some really large files, I imagine you have very high coverage?
Unfortunately this will require you to make some manual modifications to reduce the computational burden.
My recommendations would be to do the following:
nphase partial
and reusing the same VCF file each time (which will have been run on the entire genome)If you'd like, you can email me at omaroakheart@gmail.com and we can set up a call to talk about your use case for nPhase. It's possible that nPhase isn't going to give you the data that you're looking for. For example, it shouldn't be capable of giving you a chromosome-scale phasing. But there are things it can do well, like phase individual genes and regions in a ploidy agnostic way. It depends what information you're trying to get.
As long as nPhase doesn't make efficient use of heuristics to drastically speed up prediction time, users will run into issues with trying to run it on large genomes and could benefit from a guide to help reduce the time it takes to obtain results and how to interpret them.