ENCODE-DCC / atac-seq-pipeline

ENCODE ATAC-seq pipeline
MIT License
389 stars 174 forks source link

6 days stuck on task=atac.read_genome_tsv:-1, retry=0, status=Running #421

Open marlmatos opened 1 year ago

marlmatos commented 1 year ago

Hi, I am running the ATAC-seq pipeline for the first time. I noticed that my jobs are running but they have been stuck in the same initial step for a while now. it's a very simple job, paired reads no replicates. It has been 6 days with no progress

I have a lot of samples that I need to process. This is my script for the first 30

#!/bin/bash -l
#SBATCH --job-name=atac_pipeline    # Job name
#SBATCH -p quick
#SBATCH -t 02:00:00
#SBATCH --nodes=1
#SBATCH --mem=4G
#SBATCH --output=/gs/gsfs0/users/marlrodrig/aging_project/ATAC_seq/scripts/encode_atac_pipeline/logs/atac_pipeline.%A_%a.out
#SBATCH --array=1-30%30

#### Directory containing the JSON files
JSON_path="/gs/gsfs0/users/marlrodrig/aging_project/ATAC_seq/scripts/encode_atac_pipeline/json_paths"
OUT="/gs/gsfs0/users/marlrodrig/aging_project/ATAC_seq/results/atac_pipeline_posttrim"
pipeline="/gs/gsfs0/users/marlrodrig/packages/atac-seq-pipeline"

echo "$SLURM_ARRAY_TASK_ID"

LINE=$(sed -n "$SLURM_ARRAY_TASK_ID"p "$JSON_path")
echo $LINE

module load java/13

# Submit the job and get the job ID
caper hpc submit $pipeline/atac.wdl -i $LINE --singularity --leader-job-name "atac_\${LINE}"

my json

{
    "atac.pipeline_type": "atac",
    "atac.genome_tsv" : "https://storage.googleapis.com/encode-pipeline-genome-data/genome_tsv/v4/hg38.tsv" , 
    "atac.paired_end": true,
    "atac.auto_detect_adapter": true,
    "atac.dup_marker": "picard",
    "atac.mapq_thresh": 30,
    "atac.pval_thresh": 0.01,
    "atac.blacklist": "/gs/gsfs0/users/taythomp/Greally/References/Blacklists/hg38.blacklist.bed",
    "atac.smooth_win": 150,
    "atac.gensz": "hs",
    "atac.title": "CD4 Aging ATACseq",
    "atac.description": "ATAC-seq on 407 ",
    "atac.fastqs_rep1_R1": ["/gs/gsfs0/home/marlrodrig/aging_project/ATAC_seq/data/trimmed_fastq/T0101_GTGTACCTTC-TAGCTCACAG_HY55YDSX5_L003_001.R1_val_1.fq.gz"],
    "atac.fastqs_rep1_R2": ["/gs/gsfs0/home/marlrodrig/aging_project/ATAC_seq/data/trimmed_fastq/T0101_GTGTACCTTC-TAGCTCACAG_HY55YDSX5_L003_001.R2_val_2.fq.gz"]
  }

here is the tail of the slurm*out

2023-08-01 12:24:21,269|caper.cromwell_workflow_monitor|INFO| Workflow: id=82d172bd-aaea-4d
b7-988d-04767d16b63a, status=Submitted
2023-08-01 12:24:21,343|caper.cromwell_workflow_monitor|INFO| Workflow: id=82d172bd-aaea-4d
b7-988d-04767d16b63a, status=Running
2023-08-01 12:24:35,816|caper.cromwell_workflow_monitor|INFO| Task: id=82d172bd-aaea-4db7-9
88d-04767d16b63a, task=atac.read_genome_tsv:-1, retry=0, status=Started, job_id=146435
2023-08-01 12:24:35,835|caper.cromwell_workflow_monitor|INFO| Task: id=82d172bd-aaea-4db7-9
88d-04767d16b63a, task=atac.read_genome_tsv:-1, retry=0, status=Running
(END)

cromwell.out

for ITER in 1 2 3
do
    sbatch --export=ALL -J cromwell_82d172bd_read_genome_tsv -D /gs/gsfs0/home/marlrodrig/a
ging_project/ATAC_seq/scripts/encode_atac_pipeline/atac/82d172bd-aaea-4db7-988d-04767d16b63
a/call-read_genome_tsv -o /gs/gsfs0/home/marlrodrig/aging_project/ATAC_seq/scripts/encode_a
tac_pipeline/atac/82d172bd-aaea-4db7-988d-04767d16b63a/call-read_genome_tsv/execution/stdou
t -e /gs/gsfs0/home/marlrodrig/aging_project/ATAC_seq/scripts/encode_atac_pipeline/atac/82d
172bd-aaea-4db7-988d-04767d16b63a/call-read_genome_tsv/execution/stderr \
        -p unlimited  \
        -n 1 --ntasks-per-node=1 --cpus-per-task=1 --mem=2048M --time=240  \
         \
        /gs/gsfs0/home/marlrodrig/aging_project/ATAC_seq/scripts/encode_atac_pipeline/atac/
82d172bd-aaea-4db7-988d-04767d16b63a/call-read_genome_tsv/execution/script.caper && exit 0
    sleep 30
done
exit 1
2023-08-01 12:24:35,816 INFO  - DispatchedConfigAsyncJobExecutionActor [UUID(82d172bd)atac.
read_genome_tsv:NA:1]: job id: 146435
2023-08-01 12:24:35,832 INFO  - DispatchedConfigAsyncJobExecutionActor [UUID(82d172bd)atac.
read_genome_tsv:NA:1]: Cromwell will watch for an rc file but will *not* double-check wheth
er this job is actually alive (unless Cromwell restarts)
2023-08-01 12:24:35,833 INFO  - DispatchedConfigAsyncJobExecutionActor [UUID(82d172bd)atac.
read_genome_tsv:NA:1]: Status change from - to Running

Similar issues that been posted here, however it seems that this problem arises from many scenarios. I have not gotten an error.

Thanks in advance, I hope to hear from you soon.