Closed kelly-sovacool closed 2 months ago
test run command to modify
/data/Ziegelbauer_lab/Pipelines/circRNA/v0.10.1/charlie \
-w=/data/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.10.x/samples_15 \
-m=init \
-g=hg38 \
-v=NC_009333.1,KT899744.1,NC_006273.2 \
-s /data/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.10.x/samples_15.tsv
Created a new samples.tsv
file with just 4 samples from Vishal's samples_15.tsv
.
/data/Ziegelbauer_lab/Pipelines/circRNA/v0.10.1/charlie \
-w=/data/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/charlie_v0.10.1 \
-m=init -g=hg38 -v=NC_009333.1,KT899744.1,NC_006273.2 \
-s=/data/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/samples.tsv
Currently running on biowulf with latest release so we can compare outputs to the containerized version.
/data/Ziegelbauer_lab/Pipelines/circRNA/v0.10.1/charlie \
-w=/data/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/charlie_v0.10.1 \
-m=run
Testing containerized version:
/data/Ziegelbauer_lab/Pipelines/circRNA/charlie-dev-sovacool/charlie \
-w=/data/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/charlie_dev \
-m=init -g=hg38 -v=NC_009333.1,KT899744.1,NC_006273.2 \
-s=/data/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/samples.tsv
/data/Ziegelbauer_lab/Pipelines/circRNA/charlie-dev-sovacool/charlie \
-w=/data/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/charlie_dev \
-m=run -g=hg38 -v=NC_009333.1,KT899744.1,NC_006273.2 \
-s=/data/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/samples.tsv
/usr/bin/bash: line 32: fastq-filter: command not found
need to add to cutadapt docker
Edit: fixed and renamed the container charlie_cutadapt_fqfilter
create_index
failed due to missing output files
MissingOutputException in rule create_index in file /vf/users/Ziegelbauer_lab/Pipelines/circRNA/charlie-dev-sovacool/workflow/rules/create_index.smk, line 4:
Job 0 completed successfully, but some output files are missing. Missing files after 120 seconds. This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait:
/vf/users/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/charlie_dev/ref/NCLscan_index/AllRef.ndx
Removing output files of failed job create_index since they might be corrupted:
/vf/users/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/charlie_dev/ref/ref.genes.genepred_w_geneid, /vf/users/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/charlie_dev/ref/STAR_no_GTF/SA, /vf/users/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/charlie_dev/ref/ref.fixed.gtf, /vf/users/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/charlie_dev/ref/ref.transcripts.fa, /vf/users/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/charlie_dev/ref/ref.dummy.fa, /vf/users/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/charlie_dev/ref/separate_fastas/separate_fastas.lst
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
test on FRCE
/home/sovacoolkl/CHARLIE/charlie \
-w=/scratch/cluster_scratch/sovacoolkl/charlie_dev_test/charlie_iss-99 \
-m=init -g=hg38 -v=NC_009333.1,KT899744.1,NC_006273.2 \
-s=/scratch/cluster_scratch/sovacoolkl/charlie_dev_test/samples.tsv
/home/sovacoolkl/CHARLIE/charlie \
-w=/scratch/cluster_scratch/sovacoolkl/charlie_dev_test/charlie_iss-99 \
-m=run -g=hg38 -v=NC_009333.1,KT899744.1,NC_006273.2 \
-s=/scratch/cluster_scratch/sovacoolkl/charlie_dev_test/samples.tsv
Activating singularity image /vf/users/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/charlie_dev/.snakemake/singularity/b688737477c8cf86b329e4227da72916.simg
+ '[' -d /lscratch/25273199 ']'
+ TMPDIR=/lscratch/25273199/09975c64-8e35-4c64-bd19-c0afbf581a78
+ '[' '!' -d /lscratch/25273199/09975c64-8e35-4c64-bd19-c0afbf581a78 ']'
+ mkdir -p /lscratch/25273199/09975c64-8e35-4c64-bd19-c0afbf581a78
++ dirname /vf/users/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/charlie_dev/results/G1_Normal/DCC/CircRNACount
+ cd /vf/users/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/charlie_dev/results/G1_Normal/DCC
+ '[' PE == PE ']'
+ DCC @/vf/users/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/charlie_dev/results/G1_Normal/DCC/samplesheet.txt \
--temp /lscratch/25273199/09975c64-8e35-4c64-bd19-c0afbf581a78/DCC --threads 4 --detect --gene \
--bam /vf/users/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/charlie_dev/results/G1_Normal/STAR2p/G1_Normal_p2.bam \
-ss \
--annotation /vf/users/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/charlie_dev/ref/ref.fixed.gtf \
--chrM -G --rep_file /data/CCBR_Pipeliner/db/PipeDB/charlie/fastas_gtfs/hg38.repeats.gtf \
--refseq /vf/users/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/charlie_dev/ref/ref.fa \
--PE-independent \
-mt1 @/vf/users/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/charlie_dev/results/G1_Normal/DCC/mate1.txt \
-mt2 @/vf/users/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/charlie_dev/results/G1_Normal/DCC/mate2.txt
[W::hts_idx_load3] The index file is older than the data file: /vf/users/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/charlie_dev/results/G1_Normal/STAR2p/G1_Normal_p2.bam.csi
Traceback (most recent call last):
File "/usr/local/bin/DCC", line 11, in <module>
load_entry_point('DCC==0.5.0', 'console_scripts', 'DCC')()
File "/usr/local/lib/python3.8/dist-packages/DCC-0.5.0-py3.8.egg/DCC/main.py", line 490, in main
File "/usr/local/lib/python3.8/dist-packages/DCC-0.5.0-py3.8.egg/DCC/main.py", line 679, in findCircSkipJunction
File "/usr/local/lib/python3.8/dist-packages/DCC-0.5.0-py3.8.egg/DCC/Circ_nonCirc_Exon_Match.py", line 281, in findcircAdjacent
File "/usr/local/lib/python3.8/dist-packages/DCC-0.5.0-py3.8.egg/DCC/Circ_nonCirc_Exon_Match.py", line 222, in getAdjacent
ValueError: invalid literal for int() with base 10: '3"'
[Tue Apr 30 00:44:26 2024]
Error in rule dcc:
jobid: 0
input: /vf/users/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/charlie_dev/results/G1_Normal/DCC/samplesheet.txt, /vf/users/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/charlie_dev/results/G1_Normal/DCC/mate1.txt, /vf/users/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/charlie_dev/results/G1_Normal/DCC/mate2.txt, /vf/users/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/charlie_dev/results/G1_Normal/STAR2p/G1_Normal_p2.bam, /vf/users/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/charlie_dev/ref/ref.fixed.gtf
output: /vf/users/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/charlie_dev/results/G1_Normal/DCC/CircRNACount, /vf/users/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/charlie_dev/results/G1_Normal/DCC/CircCoordinates, /vf/users/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/charlie_dev/results/G1_Normal/DCC/LinearCount, /vf/users/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/charlie_dev/results/G1_Normal/DCC/G1_Normal.dcc.counts_table.tsv, /vf/users/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/charlie_dev/results/G1_Normal/DCC/G1_Normal.dcc.counts_table.tsv.filtered
shell:
This worked with the previous charlie version. (/data/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/charlie_v0.10.1
)
samtools stat charlie_v0.10.1/results/G1_Tumor/STAR2p/G1_Tumor_p2.bam > G1_Tumor_p2.bam.stat.old
samtools stat charlie_dev/results/G1_Tumor/STAR2p/G1_Tumor_p2.bam > G1_Tumor_p2.bam.stat.new
diff G1_Tumor_p2.bam.stat.*
3c3
< # The command line was: stat charlie_dev/results/G1_Tumor/STAR2p/G1_Tumor_p2.bam
---
> # The command line was: stat charlie_v0.10.1/results/G1_Tumor/STAR2p/G1_Tumor_p2.bam
md5sum charlie_dev/ref/ref.fixed.gtf charlie_v0.10.1/ref/ref.fixed.gtf
54dcc6005272fcda13e6c46c76ec9b3d charlie_dev/ref/ref.fixed.gtf
54dcc6005272fcda13e6c46c76ec9b3d charlie_v0.10.1/ref/ref.fixed.gtf
library(tidyverse)
files <- tibble(dev = c('charlie_dev/results/G1_Tumor/STAR1p/G1_Tumor_p1.Chimeric.out.junction',
'charlie_dev/results/G1_Tumor/STAR1p/mate1/G1_Tumor_mate1.Chimeric.out.junction',
'charlie_dev/results/G1_Tumor/STAR1p/mate2/G1_Tumor_mate2.Chimeric.out.junction'),
rel = c('charlie_v0.10.1/results/G1_Tumor/STAR1p/G1_Tumor_p1.Chimeric.out.junction',
'charlie_v0.10.1/results/G1_Tumor/STAR1p/mate1/G1_Tumor_mate1.Chimeric.out.junction',
'charlie_v0.10.1/results/G1_Tumor/STAR1p/mate2/G1_Tumor_mate2.Chimeric.out.junction'),)
files %>% pmap(\(dev, rel) all_equal(read_tsv(dev), read_tsv(rel)))
[[1]]
[1] TRUE
[[2]]
[1] TRUE
[[3]]
[1] TRUE
release version used conda env: https://github.com/CCBR/CHARLIE/blob/e19cd66f319655ea5c5bd4ca4481f9fdfb88a4fd/workflow/rules/findcircrna.smk#L722-L723
now using docker: https://github.com/CCBR/CHARLIE/blob/fbdb6647ad2aafa13218845f864de0b8632f5fc2/docker/dcc/Dockerfile#L12-L16
Both use v0.5.0. According to the release notes, DCC 0.5.0 requires python 3.5 and no longer supports python 2.7.
I tried having the docker container install DCC via conda, but the rule still failed with the same error.
After rebuilding the docker to install DCC 0.5.0 from conda, it still fails with the same error as before:
Activating singularity image /data/CCBR_Pipeliner/SIFS/charlie_dcc_v0.1.0.sif
+ '[' -d /lscratch/25536525 ']'
+ TMPDIR=/lscratch/25536525/8e9ea0a8-9ea7-406e-ab74-605db2e6e40d
+ '[' '!' -d /lscratch/25536525/8e9ea0a8-9ea7-406e-ab74-605db2e6e40d ']'
+ mkdir -p /lscratch/25536525/8e9ea0a8-9ea7-406e-ab74-605db2e6e40d
++ dirname /vf/users/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/charlie_dev/results/G1_Tumor/DCC/CircRNACount
+ cd /vf/users/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/charlie_dev/results/G1_Tumor/DCC
+ '[' PE == PE ']'
+ DCC @/vf/users/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/charlie_dev/results/G1_Tumor/DCC/samplesheet.txt --temp /lscratch/25536525/8e9ea0a8-9ea7-406e-ab74-605db2e6e40d/DCC --threads 4 --detect --gene --bam /vf/users/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/charlie_dev/results/G1_Tumor/STAR2p/G1_Tumor_p2.bam -ss --annotation /vf/users/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/charlie_dev/ref/ref.fixed.gtf --chrM -G --rep_file /data/CCBR_Pipeliner/db/PipeDB/charlie/fastas_gtfs/hg38.repeats.gtf --refseq /vf/users/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/charlie_dev/ref/ref.fa --PE-independent -mt1 @/vf/users/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/charlie_dev/results/G1_Tumor/DCC/mate1.txt -mt2 @/vf/users/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/charlie_dev/results/G1_Tumor/DCC/mate2.txt
[W::hts_idx_load3] The index file is older than the data file: /vf/users/Ziegelbauer_lab/circRNADetection/sovacoolkl_charlie/charlie_dev/results/G1_Tumor/STAR2p/G1_Tumor_p2.bam.csi
Traceback (most recent call last):
File "/opt2/conda/envs/dcc/bin/DCC", line 10, in <module>
sys.exit(main())
File "/opt2/conda/envs/dcc/lib/python3.10/site-packages/DCC/main.py", line 490, in main
CircSkipfiles = findCircSkipJunction(output_coordinates, options.tmp_dir,
File "/opt2/conda/envs/dcc/lib/python3.10/site-packages/DCC/main.py", line 679, in findCircSkipJunction
circStartAdjacentExons, circStartAdjacentExonsIv = CCEM.findcircAdjacent(circStartExons, Custom_exon_id2Iv,
File "/opt2/conda/envs/dcc/lib/python3.10/site-packages/DCC/Circ_nonCirc_Exon_Match.py", line 281, in findcircAdjacent
interval = Custom_exon_id2Iv[self.getAdjacent(ids, start=start)]
File "/opt2/conda/envs/dcc/lib/python3.10/site-packages/DCC/Circ_nonCirc_Exon_Match.py", line 222, in getAdjacent
exon_number = int(custom_exon_id.split(':')[1]) - 1
ValueError: invalid literal for int() with base 10: '1"'
On further inspection, it looks like the DCC conda env on biowulf was built with python 2.7:
/data/CCBR_Pipeliner/db/PipeDB/Conda/envs/DCC/lib/python2.7/site-packages
errors on FRCE:
sbatch: error: invalid partition specified: ccr
sbatch: error: Batch job submission failed: Invalid partition name specified
sbatch: error: Invalid generic resource (gres) specification
Error submitting jobscript (exit code 1):
Will need to edit cluster.json
and submit_script.sbatch
accordingly
Looks like the DCC devs are aware of the issue and fixed it in the master branch -- https://www.github.com/dieterich-lab/DCC/issues/103
Edited the docker container to use the dev version. It worked!
First run-through on biowulf completed successfully after several bug fixes. Re-run from start to finish completed successfully on biowulf. Test in progress on frce.
more problems on FRCE:
sbatch: error: CPU count per node can not be satisfied
sbatch: error: Batch job submission failed: Requested node configuration is not available
need to reduce threads for FRCE, but I can't find how many are available per node on the norm
partition
https://ncifrederick.cancer.gov/staff/frce/documentation/slurm-partitions-features
just switched jobs that requested 56
threads to 32
for FRCE and jobs are running now
edit: found the FRCE hardware config here: https://ncifrederick.cancer.gov/staff/frce/documentation/frce-hardware-capabilities
Currently running on FRCE with improved handling of config & cluster templates
error on FRCE:
SystemExit in file /home/sovacoolkl/CHARLIE/workflow/rules/init.smk, line 20:
File: /mnt/projects/CCBR-Pipelines/db/charlie/fastas_gtfs/hg38.fa does not exists!
File "/home/sovacoolkl/CHARLIE/workflow/Snakefile", line 19, in <module>
File "/home/sovacoolkl/CHARLIE/workflow/rules/init.smk", line 190, in <module>
File "/home/sovacoolkl/CHARLIE/workflow/rules/init.smk", line 29, in check_readaccess
File "/home/sovacoolkl/CHARLIE/workflow/rules/init.smk", line 20, in check_existence
SystemExit in file /home/sovacoolkl/CHARLIE/workflow/rules/init.smk, line 20:
File: /mnt/projects/CCBR-Pipelines/db/charlie/fastas_gtfs/hg38.fa does not exists!
File "/home/sovacoolkl/CHARLIE/workflow/Snakefile", line 19, in <module>
File "/home/sovacoolkl/CHARLIE/workflow/rules/init.smk", line 190, in <module>
File "/home/sovacoolkl/CHARLIE/workflow/rules/init.smk", line 29, in check_readaccess
File "/home/sovacoolkl/CHARLIE/workflow/rules/init.smk", line 20, in check_existence
even though the file does exist 🤔
file /mnt/projects/CCBR-Pipelines/db/charlie/fastas_gtfs/hg38.fa
/mnt/projects/CCBR-Pipelines/db/charlie/fastas_gtfs/hg38.fa: ASCII text, with very long lines
is /mnt
not available in compute nodes on FRCE??
Edit: this seems to be a FRCE regression -- tried to submit a RENEE job and that failed for the same reason
/var/spool/slurmd/job37856165/slurm_script: line 4: /mnt/projects/CCBR-Pipelines/pipelines/RENEE/renee-dev-sovacool/bin/renee: No such file or directory
Submitted a help ticket
upgraded snakemake in the shared conda env on FRCE to v7
conda activate /mnt/projects/CCBR-Pipelines/conda/envs/snakemake
mamba install -c bioconda snakemake=7.32.4
on FRCE, star_circrnafinder
hangs indefinitely and gets cancelled by slurm, but actually completes successfully in < 3 hours when run interactively.
development in progress here:
/data/CCBR_Pipeliner/Pipelines/CHARLIE/charlie-dev-sovacool