BGI-shenzhen / VCF2Dis

VCF2Dis: A new simple and efficient software to calculate p-distance matrix and construct population phylogeny based Variant Call Format
MIT License
75 stars 20 forks source link

Time and memory consumption #2

Closed leishenggit closed 1 month ago

leishenggit commented 2 years ago

Is there any descriptions about time and memory cconsumption? It would be helpful when running the program in PBS ( Protable Batch System).

hewm2008 commented 2 years ago

For VCF file . A Memory The memory consumed is about 10M ,and there is almost no need to consider the cost of memory。

B Speed The speed has been adjusted and optimized many times, and the version after 1.40 is very fast. The time is related to the amount of sample and the snp dataset. The more sites there are, the time increases in proportion. The more the sample amount, the more time increases. It takes about 1 day for 1000 samples of 1M sites.

leishenggit commented 2 years ago

Thanks for your explanation. But I got a new question as shown below

1

The program is killed. What causes it?

hewm2008 commented 2 years ago

I know the reason. too many sample : 2852900 , it will need a initialize a two-dimensional matrix (2852900 2852900); so this will take a big Memory. 2852900 2852900
I suggest using one sample for the same genotype.