dereneaton / ipyrad

Interactive assembly and analysis of RAD-seq data sets
http://ipyrad.readthedocs.io
GNU General Public License v3.0
70 stars 39 forks source link

changing VSEARCH default threads #480

Closed Phismil closed 2 years ago

Phismil commented 2 years ago

Good day, Isaac et al., I am currently running ipyrad on a big dataset. I noticed that VSEARCH clustering is taking very long. I see in the source code that the default number of threads in this step is "2". I tried the "-t" flag to increase it to 10 or even 20, but did not notice any changes in the speed, and step 3 still is a major bottleneck in my dataset. I am wondering if you have any recommendations to speed up this step. Thank you in advance

isaacovercast commented 2 years ago

Hello Phismil, how much RAM are you allocating? And how many cores? We recommend 4GB per core at the minimum, and if you have paired end data or long reads (>150bp) then more will be required. If you don't have enough ram per core then step 3 will crawl no matter what you do (-t won't help).

Since this is more of a question about operation and not an issue with ipyrad I would suggest we move the discussion to the ipyrad gitter channel, how does that sound? I will close this issue once we pick up the discussion on gitter.

https://gitter.im/dereneaton/ipyrad

Phismil commented 2 years ago

Fantastic! Thank you for your response. We continue the discussion at gitter. Cheers