AntonelliLab / seqcap_processor

Bioinformatic pipeline for processing Sequence Capture data for Phylogenetics
MIT License
21 stars 12 forks source link

GC overhead limit exceeded #24

Open MikeSanJose opened 2 years ago

MikeSanJose commented 2 years ago

I am trying to run your pipeline on some WGS data (~20x coverage). I ran into this error when I tried to run the first step (quality check). I am running this on a node with 72 cores and 384gb of memory. I didnt see any flags where i could increase memory to java. Any help would be appreciated.

Exception in thread "Thread-4" java.lang.OutOfMemoryError: GC overhead limit exceeded at java.lang.String.toCharArray(String.java:2899) at uk.ac.babraham.FastQC.Modules.BasicStats.processSequence(BasicStats.java:123) at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:89) at java.lang.Thread.run(Thread.java:748)

tandermann commented 2 years ago

SECAPR is using the software dependency FASTQC for this step. Up to this point this was running on a single core by default. I just added the --cores flag to secapr quality_check which may solve your problem. Try running the command by adding --cores 72 or however many you want to use.

For this to work, you first need to install the GitHub development version of secapr. So connect to your conda secapr_env and run the command pip install https://github.com/AntonelliLab/seqcap_processor/archive/refs/tags/v2.2.4.tar.gz to install the latest version.

Let me know if that works, or if the problem persists.

MikeSanJose commented 2 years ago

Seems to be working. I'll let you know if I run into any more problems with the other steps.

Thanks