CCBR / Pipeliner

An open-source and scalable solution to NGS analysis powered by the NIH's Biowulf cluster.
4 stars 0 forks source link

exome pipeline subjobs suspend #479

Closed faridrashidi closed 1 year ago

faridrashidi commented 1 year ago

Dear Pipeliner developers,

I want to express my gratitude for developing this helpful pipeline. I have been trying to run the exome pipeline on a cohort on Biowulf, but I encountered an issue where the subjobs are getting suspended and run with no progress. Unfortunately, even after contacting the HPC team, the problem persists, and they have been unable to provide much help.

Upon further investigation of the logs, it appears that there are some errors regarding GATK/3.8-0. I was wondering if you could provide any insights or suggestions on how to resolve this issue. Here's one example of the log files:

tail /vf/users/rashidimehrabf2/TIGER/exome_bam2/slurm-59893761.out
INFO  04:13:20,052 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk
INFO  04:13:20,052 HelpFormatter - [Sat Mar 04 04:13:20 EST 2023] Executing on Linux 3.10.0-862.14.4.el7.x86_64 amd64
INFO  04:13:20,052 HelpFormatter - OpenJDK 64-Bit Server VM 1.8.0_181-b13
INFO  04:13:20,055 HelpFormatter - Program Args: -T RealignerTargetCreator -I LCS750B.dedup.bam -R /data/CCBR_Pipeliner/db/PipeDB/lib/GRCh38.d1.vd1.fa -known /fdb/GATK_resource_bundle/hg38bundle/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz -known /data/CCBR_Pipeliner/db/PipeDB/lib/ALL.wgs.1000G_phase3.GRCh38.ncbi_remapper.20150424.shapeit2_indels.vcf.gz -o LCS750B.fin.bam.intervals
INFO  04:13:20,059 HelpFormatter - Executing as rashidimehrabf2@cn4327 on Linux 3.10.0-862.14.4.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_181-b13.
INFO  04:13:20,060 HelpFormatter - Date/Time: 2023/03/04 04:13:20
INFO  04:13:20,060 HelpFormatter - ----------------------------------------------------------------------------------
INFO  04:13:20,060 HelpFormatter - ----------------------------------------------------------------------------------
ERROR StatusLogger Unable to create class org.apache.logging.log4j.core.impl.Log4jContextFactory specified in jar:file:/usr/local/apps/GATK/3.8-0/GenomeAnalysisTK.jar!/META-INF/log4j-provider.properties
ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console...

Thank you once again for your contribution to the scientific community, and I look forward to hearing from you soon.

skchronicles commented 1 year ago

@faridrashidi

It looks like this could be due to the system installation of that GATK3 module on Biowulf. The error message is related to a missing log4j class. I am wondering if it is missing due to the log4j vulnerability that was found a few years ago. Did HPC staff mention anything related to that when you last spoke to them?

With that being said, this logging error could be a red herring for something else. Could you attach or send us the entire log file?

Also, we have another pipeline for WES data. At the current moment, it is only compatible with human data, uses GRCh38/hg38, but it looks like that would not be an issue for this project. Please let me know if you would be interested in using that pipeline.

faridrashidi commented 1 year ago

Thank you very much dear Skyler for your assistance on this issue. My problem is solved by using the other pipeline. So, I close this issue.