ablab / VerityMap

GNU General Public License v3.0
30 stars 5 forks source link

Read mapping is very slow on diploid human genome assembly #28

Open jeizenga opened 1 year ago

jeizenga commented 1 year ago

I tried to use VerityMap to validate a diploid human genome assembly using HiFi reads, but on my data it was too slow to be practical. I let it run for >3 weeks one 16 threads, and it only mapped up to about 4x. Is this speed expected? Are there any tweaks I can make to increase it?

The command I ran was

python3 main.py --reads reads.fastq.gz -o verity_map_output -t 16 -d hifi-diploid \
    assembly.haplotype1.fasta assembly.haplotype2.fasta

Another question/request: I understand from the paper that VerityMap also includes analysis modules to detect the location of misassemblies. As far as I can see, these can only be accessed after read mapping concludes (I believe the relevant code is here). Is this correct? It would be useful if the interface allowed a more modular option that could be run independently of mapping, especially since it seems like I will need to troubleshoot the mapping stage.