CCRGeneticsBranch / khanlab_ngs_pipeline

0 stars 1 forks source link

Errors seen across multiple samples related to memory #13

Closed slsevilla closed 1 year ago

slsevilla commented 1 year ago

Problem: Multiple errors seen across a significant number of samples in latest sample run.

Example errors from log /data/khanlab2/processed_DATA/ngs_pipeline_SJ031111=SJEPD031111_D1=20220911_20221116_151637.log :

Error executing rule FUSION_CATCHER on cluster (jobid: 33, external: 52717158, jobscript: /gpfs/gsfs10/users/khanlab2/processed_DATA/.snakemake/tmp.yznfwbmd/FUSION_CATCHER.33). For error details see the cluster log and the log files of the involved rule(s).
Error executing rule mixcr_RNASeq on cluster (jobid: 46, external: 52717129, jobscript: /gpfs/gsfs10/users/khanlab2/processed_DATA/.snakemake/tmp.yznfwbmd/mixcr_RNASeq.46). For error details see the cluster log and the log files of the involved rule(s).
Error executing rule FUSION_CATCHER on cluster (jobid: 33, external: 52732657, jobscript: /gpfs/gsfs10/users/khanlab2/processed_DATA/.snakemake/tmp.yznfwbmd/FUSION_CATCHER.33). For error details see the cluster log and the log files of the involved rule(s).
Error executing rule mixcr_RNASeq on cluster (jobid: 46, external: 52735119, jobscript: /gpfs/gsfs10/users/khanlab2/processed_DATA/.snakemake/tmp.yznfwbmd/mixcr_RNASeq.46). For error details see the cluster log and the log files of the involved rule(s).
Error executing rule arriba on cluster (jobid: 34, external: 52717235, jobscript: /gpfs/gsfs10/users/khanlab2/processed_DATA/.snakemake/tmp.yznfwbmd/arriba.34). For error details see the cluster log and the log files of the involved rule(s).
Exiting because a job execution failed. Look above for error message

Review of one error log log/FUSION_CATCHER.52732657.e

Error message:

tr: write error: Disk quota exceeded

Solution: It appears that the errors related to this project are due to disc space issues. Considering we are attempting to move analysis to a new location (related to problem with Biowulf (#12) this is a larger concern. We are not utilizing scratch space effectively and are keeping intermediate files not being used by downstream analysis, which leaves a large pipeline footprint per sample. Will need to determine a course of action to be able to handle the reprocessing of samples + new samples coming through the pipeline more effectively.

kopardev commented 1 year ago

I have a few questions/observations:

slsevilla commented 1 year ago

Talked with Xinyu this morning and it was a memory issue (perhaps he had deleted files in between the errors and you running checkquota). He has a list of the projects affected and will delete these runs and restart.