eblerjana / pangenie

Pangenome-based genome inference
MIT License
93 stars 10 forks source link

GraphBuilder: number of paths is limited to 254 in current implementation. #79

Open cgroza opened 2 weeks ago

cgroza commented 2 weeks ago

Hi,

version: v3.0.1
Files and parameters used:
-e      3000000000
-k      31
-o      index/processed
-r      chm13v2.0.fa
-t      10
-v      merged.sorted.multi.vcf
Determine allele sequences ...
Read reference genome ...
Found 25 chromosome(s) from the reference file.
Read input VCF ...
terminate called after throwing an instance of 'std::runtime_error'
  what():  GraphBuilder: number of paths is limited to 254 in current implementation.

I am hitting this error, trying to genotype a VCF with nearly 1000 genomes. Do I need to reduce the number of genomes? Otherwise, how difficult would it be to change pangenie to support more genomes?

eblerjana commented 2 weeks ago

The current implementation does not support more than 254 paths, the reason being that the size of the variables storing path information is limited. It should not be very difficult for me to fix this, but this would increase the memory requirements of the program significantly and also the algorithm is likely too slow to handle larger panels. We are working on an algorithm to subsample paths from the input VCF to make PanGenie applicable to larger panels, but we don't have anything ready for this yet unfortunately.