How can I reduce the Memory usage of Loter?

silvewheat commented 3 years ago

Hello, It seems Loter will consume lots of memory with the increase of number of SNPs and the number of individuals. Also, It seems that the memory usage also related with the number of threads？ How can I reduce the memory usage of Loter? To keep only sites that MAF >=1 in at leat one group? To divided a chromsome into several chunks? And how I can eliminate the impact on the edge of introgression segment, if I divided the chromsome?

Beat, Yudong

gdurif commented 3 years ago

Hi,

Could you give some more details about the order of magnitude regarding the number of SNPs, the number of individuals in reference populations, and the number of admixed individuals for which you want to estimate the local ancestry information ? and some details about the architecture (number of cores, memory) that you use ?

The multi-threading uses parallelization over the admixed individuals which can be cumbersome because all threads do not have a synchronous access to reference populations. A first idea could be to treat sequentially admixed individuals, but you lose multi-threading.

Just to be sure, you analyze each chromosome separately?

I do not know how filtering out SNPs by MAF will affect the resulted inferred local ancestry, it is worth a try. It is certain that reducing the number of SNPs will decrease the memory footprint, but maybe at the cost of imprecision in the results.

Dividing chromosome in several chunks is also a potential solution. I would cut between SNPs that are the most far away regarding genetic distance, or at least regarding positions. Since we use SNP adjacency to infer ancestry chunk in the algorithm (without accounting for genetic distance directly), I think that it is the best way to eliminate the impact on the edge of introgression.

If you have any other questions or need clarification, do not hesitate. Best

silvewheat commented 3 years ago

Yes, I use loter for each chromosome separately.

For example, I use loter_cli in chromsome 27 of goat.

Number of SNPs = 1,479,604
Sample size (number of diploid individual) of ref pop1 = 79 (158 haplotype)
Sample size of ref pop2 = 73
Sample size of mix pop = 23
Number of threads = 4
Max Memory = 17,770 MB

Another exmaple:

Number of SNPs = 5,032,826
Sample size of ref pop1 = 30
Sample size of ref pop2 = 79
Sample size of mix pop = 73
Number of threads = 4
Max Memory = 56,377 MB

Acoording to your description, if there is only one admixed individual, the multi-threading is useless? And if the number of thread double, the memory used for store the reference haplotype matrix will also double?

I will try to run loter in a large RAM node. If the job still run out of memory, I'll try to run it one by one for admix individuals with single thread.

Best wished.

gdurif commented 3 years ago

Yes, if there is one admixed individual, the multi-threading is useless (we are aware of the flaw and thinking about a refactoring).

I am not sure if the link between number of threads and memory imprint is linear. I would say no (because threads shares some of the memory). My guess is that it would grow a bit, but it should be manageable.

For my experiments, I mainly work on human genomic data and the number of SNPs per chromosome was way smaller (I add similar sample size). I better understand your issue.

However, the maximal amount of memory that you mention is not that large (any recent-ish computer has now GBs of memory). Did you mean MB or GB ? If it is indeed MB and nonetheless too large for your machine, I am pretty sure it will be ok on a large RAM node.

Best

silvewheat commented 3 years ago

Thanks for your answer. Maybe I run too many job in the same time, previously. Because for each job, I set 4 threads, so I run 6 jobs in the same time (24 core in a node). As you can see in the exmaple 2, it cost 56 GB memory for one job. So, for 6 jobs, it will cost three hundreds GB. That is too large for one node. I will try to control the number of running jobs. Actually, besides the loter_cli, I also tried to run a DIY script (attached) which use VCF as input, give a better output format. That maybe also increased the max memory useage. If you plan to update Loter, it might be a good thing to also update the loter_cli, make the output format better for downstream analysis?

run_loter.py.txt

gdurif commented 3 years ago

Yes, we definitely plan to improve Loter output at the same time, thanks for the tip.

bcm-uga / Loter

How can I reduce the Memory usage of Loter? #18