Open alambard opened 1 year ago
Hi
How many input fasta files have you provided and how many sequences are there in total?
It sounds like a very large analysis. I would suggest doing a test where you reduce your input data to about 8x smaller. This should run using approximately 64x less RAM and 64x runtime than you're full analysis. Compare these numbers to what computational resources you have available. This should give a guide as to whether your full analysis is achievable.
All the best David
Hello
I have 8 fasta files from 1Go for the mallest and 94 Go for the biggest , ~220 Go in total ( more than 150 tho)
here 's my number of total sequences : 1 472 984 533
im going to test by compressing the files 8x smaller i guess
Hello , i have launched an orthofinder analysis on ~150Go of transcriptomic data coming from sra NCBI database. It's now computing since almost 30 days . I just need some experience about someone already used the pipeline for big data also ; to have an idea of how much time is needed to have the algorithm completed . At the moment it is still in blast comparison step. I have seen in recent topics that using the mmseqs algorithm is faster . Here is the command i used :
orthofinder -t 15 -a 15 -S mmseqs -d -f gambierdiscus/
80 GO of memory and 16 CPUs configuration values on the server