auton1 / LDhat

Estimate recombination rates from population genetic data
58 stars 21 forks source link

Shared a lookup table for n = 320 (theta = 0.01) #14

Open jshoyer opened 4 years ago

jshoyer commented 4 years ago

In case anyone is interested, I created a new likelihood lookup table for n = 320 sequences/chromosomes -- see https://zenodo.org/record/3934350 That seemed sufficiently large and computationally expensive to make sharing worthwhile. I would have created a pull request, but the table is too large for GitHub (207.5 MB compressed, 806.8 MB uncompressed), and centralized distribution of the tables via Git is not disk-space-efficient anyway. Ideas for helping people discover the file would be welcome. Feel free to close this issue whenever.

enocI21 commented 4 years ago

Dear jshoye,

Thank you very much for sharing your table, and would like to ask you a few questions. How long did it take you to calculate that table? Because I have tried to calculate mine for 250 (500 sequences/chromosomes, 2.6 million configurations) animals but it is too slow, and the analysis only occupies 2% of a server with 150 GB of RAM, and 32 cores, so you can imagine that It would take him 2 years to calculate this table, and well my goal would be to have a table for 1000 animals since I have at least 13000 genotypes, is there any advice you can give me? Is there a way to allocate more memory and cores to the calculation to speed up the process? thanks for your answer

jshoyer commented 4 years ago

I used both Slurm job arrays and GNU parallel to parallelize the computations with the --splits flag to ldhat complete. See the job script that I included in the Zenodo record: https://zenodo.org/record/3934350/files/ldhat-complete-n320-t0.01-split10000fold.sbatch

The job will still take quite a while with 32 CPU cores. I used hundreds of cores.