dereneaton / ipyrad

Interactive assembly and analysis of RAD-seq data sets
http://ipyrad.readthedocs.io
GNU General Public License v3.0
72 stars 40 forks source link

MemoryError() When Generating VCF #455

Closed alexkrohn closed 3 years ago

alexkrohn commented 3 years ago

I've used ipyrad before on this machine without any problems. I have 132 GB of RAM, and 80 cores on the machine. I'm analyzing a 3RAD dataset of 190 turtles. Steps 2-6 work just fine, but 7 keeps crashing because it seems to run out of memory. This is surprising, as I've run larger datasets with more SNPs without this happening. Moreover, this run uses a relatively strict clustering threshold of 0.96, making me think this VCF should be smaller than usual. I'm attaching my parameter file and the log file.

I've tried limiting the number of clusters, or not. It always seems to fail. Short of running on a more powerful machine (which I don't have...) is there anything I can do to generate this final VCF?

Thanks,

Alex

params-mate-combined-10xfilter96clust-outrawbackup.txt ipyrad_log.txt

isaacovercast commented 3 years ago

Hello Alex,

The params file you are using indicates that you are on an older version of ipyrad (0.7). I would recommend updating to the most recent version (0.9.81) and then re-running the assembly (I think you can just re-run steps 5-7). This is by far the preferred strategy.

You can also try updating to the most recent 0.7 version, for example by cloning the repository and checking out the tag for the most recent 0.7 version and then trying to pip install it, but this is probably going to be difficult. We stopped actively developing the 0.7 version almost 2 years ago, so moving to the 0.9 version will be the most reliable way forward.

If the 0.7 version you are using is old enough you might be able to change the output formats to not generate the full VCF. In the old days we used to write out the full vcf file (including invariable sites) and this file could get HUGE, so we stopped doing it. I believe the old parameter for full VCF was V and for variable sites only vcf was v, so maybe that would help.

I will leave this ticket open for now, but since it is a very old version we won't prioritize hunting down this bug, if in fact we didn't already fix it in a newer 0.7 version.

Good luck.

alexkrohn commented 3 years ago

Updating fixed this issue! Thanks!