cfarkas / SARS-CoV-2-freebayes

Analysis of SARS-CoV-2 genome variants collected with freebayes variant caller
MIT License
8 stars 3 forks source link

Too many files open_Jacquard encountered an unanticipated problem #2

Open AlbertRockG opened 2 years ago

AlbertRockG commented 2 years ago

Hi here!

I'm computing with the script SARS-CoV-2-GISAID-freebayes.sh. Once at the stage of merging VCFs using jacquard, the same problem comes back every time, even with less than 3000 VCFs files. So I come to pose my problem here in the hope that you can help me. Thanks in advance.

cfarkas commented 2 years ago

Hi AlbertRockG,

Thanks for using our scripts for your work. I think your problem might be due open file limit, if you are under Linux OS. Can you please share the SARS-CoV-2-GISAID-freebayes.sh logfile and the output of the following commands?:

cat /proc/sys/fs/file-max # maximum open file limit in your machine

also, please share the hard and soft limits for number of open files (per user). You can obtain those as follows:

ulimit -Hn
ulimit -Sn

And also, are you able to do?

ulimit -n 1000000 
ulimit -s 1000000 

This is for getting an an idea of you work environment.

Cheers,

Carlos

AlbertRockG commented 2 years ago

Hi Carlos,

I have solved the issue with a Python script. However, I would like to know the purpose of this line of code :

ulimit -n 1000000 && vcfcombine EPI*.vcf > combined_sites.raw.vcf

Since the only time the output file was used in the rest of the SARS-CoV-2-GISAID-freebayes.sh script was on line 126 and that was only to compress it.

cfarkas commented 2 years ago

Hi AlbertRockG,

I am glad that you solved the issue, can you please share the fix? . The line you are mentioning definitely it is a bug, ulimit should be not be there. Nevertheless, combined_sites.raw.vcf contains al discovered sites, without having sample names and variant frequencies, so this file is much more lighter than merged.GISAID.AF.vcf. I will fix the script.

Best Regards,

Carlos