BIMSBbioinfo / pigx_rnaseq

Bulk RNA-seq Data Processing, Quality Control, and Downstream Analysis Pipeline
GNU General Public License v3.0
20 stars 11 forks source link

Error when submitting to cluster #133

Closed Pinolinoo closed 7 months ago

Pinolinoo commented 1 year ago

Hi I have come across a new problem. When executing I get the following error:

` Error in rule hisat2_index: jobid: 45 output: /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/hisat2_index/GRCm38_index.1.ht2l, /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/hisat2_index/GRCm38_index.2.ht2l, /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/hisat2_index/GRCm38_index.3.ht2l, /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/hisat2_index/GRCm38_index.4.ht2l, /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/hisat2_index/GRCm38_index.5.ht2l, /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/hisat2_index/GRCm38_index.6.ht2l, /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/hisat2_index/GRCm38_index.7.ht2l, /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/hisat2_index/GRCm38_index.8.ht2l log: /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/logs/hisat2_index.log (check log file(s) for error message) shell: /gnu/store/k2fwdwfp75nhnkf162f9nrxd0cyjjp4x-hisat2-2.2.1/bin/hisat2-build -f -p 2 --large-index /fast/AG_Metzger/philipp/raw_data/genome_files/hg38.fa /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/hisat2_index/GRCm38_index >> /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/logs/hisat2_index.log 2>&1 (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!) cluster_jobid: Your job 5801853 ("snakejob.hisat2_index.45.sh") has been submitted

Error executing rule hisat2_index on cluster (jobid: 45, external: Your job 5801853 ("snakejob.hisat2_index.45.sh") has been submitted, jobscript: /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/.snakemake/tmp.df6a1v0e/snakejob.hisat2_index.45.sh). For error details see the cluster log and the log files of the involved rule(s). ´

my settings.yaml looks like: ` locations: reads-dir: /fast/AG_Metzger/philipp/raw_data/HD/short_reads_3d/all_reads/ output-dir: /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/ genome-fasta: /fast/AG_Metzger/philipp/raw_data/genome_files/hg38.fa cdna-fasta: /fast/AG_Metzger/philipp/raw_data/genome_files/Homo_sapiens.GRCh38.cdna.all.fa gtf-file: /fast/AG_Metzger/philipp/raw_data/genome_files/hg38.gtf

organism: hsapiens

DEanalyses:

names of analyses can be anything but they have to be unique for each combination of case control group comparisons.

analysis1:

if multiple sample names are provided, they must be separated by comma

case_sample_groups: "306"
control_sample_groups: "JG"
covariates: ''

execution: submit-to-cluster: yes jobs: 40 `

borauyar commented 1 year ago

Hi @Pinolinoo Can you show what is printed in the log file for hisat index? output/logs/hisat2_index.log

Pinolinoo commented 1 year ago

Settings: Output files: "/fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/hisat2_index/GRCm38_index..ht2l" Line rate: 7 (line is 128 bytes) Lines per side: 1 (side is 128 bytes) Offset rate: 4 (one in 16) FTable chars: 10 Strings: unpacked Local offset rate: 3 (one in 8) Local fTable chars: 6 Local sequence length: 57344 Local sequence overlap between two consecutive indexes: 1024 Endianness: little Actual local endianness: little Sanity checking: disabled Assertions: disabled Random seed: 0 Sizeofs: void:8, int:4, long:8, size_t:8 Input files DNA, FASTA: /fast/AG_Metzger/philipp/raw_data/genome_files/hg38.fa Reading reference sizes Time reading reference sizes: 00:00:16 Calculating joined length Writing header Reserving space for joined string Joining reference sequences

borauyar commented 1 year ago

Another user has just reported the same problem at the cluster and the problem seems to be a memory issue.

See discussion here: https://groups.google.com/g/pigx/c/sbmdBZ7IOdI

Could you please test the pipeline using one of the human chromosomes?

If it works, then you will need to update your settings file to ask for more resources for the hisat2_build function.

In the settings file, you can add a section, where you can modify the default memory requirement. See the output of pigx-rnaseq --init for the full list of options in settings.

execution:
  submit-to-cluster: yes
  jobs: 4
  nice: 19
  mem_mb: 128000
  rules:
    hisat2-build:
      threads: 2
      memory: 32000
Pinolinoo commented 1 year ago

How do I try it on one chromosome? Sadly when adding execution: submit-to-cluster: yes jobs: 4 nice: 19 mem_mb: 128000 rules: hisat2-build: threads: 2 memory: 32000

to the settings file it gives me the same error

borauyar commented 1 year ago

You can download a human chrosomome and use that as your target genome sequence. http://ftp.ensembl.org/pub/release-107/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.chromosome.21.fa.gz

Pinolinoo commented 1 year ago

When I try it with the one chromosome as a input fasta file I get:

Error in rule check_annotation_files: jobid: 1 output: /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/input_annotation_stats.tsv log: /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/logs/check_annotation_files.log (check log file(s) for error message) shell: /gnu/store/zwlc915dpafababhm9wjfbdla919zvxm-r-minimal-4.2.1/bin/Rscript --vanilla /gnu/store/44g4dybxljnz0ad4108zrj5xiqy6grwj-pigx-rnaseq-0.1.0/libexec/pigx_rnaseq/scripts//validate_input_annotation.R /fast/AG_Metzger/philipp/raw_data/genome_files/hg38.gtf /fast/AG_Metzger/philipp/raw_data/genome_files/Homo_sapiens.GRCh38.cdna.all.fa /fast/AG_Metzger/philipp/raw_data/genome_files/Homo_sapiens.GRCh38.dna.chromosome.21.fa /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output >> /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/logs/check_annotation_files.log 2>&1 (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!) cluster_jobid: Your job 5802064 ("snakejob.check_annotation_files.1.sh") has been submitted

Error executing rule check_annotation_files on cluster (jobid: 1, external: Your job 5802064 ("snakejob.check_annotation_files.1.sh") has been submitted, jobscript: /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/.snakemake/tmp.w27kyeh3/snakejob.check_annotation_files.1.sh). For error details see the cluster log and the log files of the involved rule(s). Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/.snakemake/log/2022-10-11T151919.111011.snakemake.log

borauyar commented 1 year ago

@Pinolinoo can you please show the content of output/logs/check_annotation_files.log ?

Pinolinoo commented 1 year ago

This now for a run on the whole genome again since I deleted the output folder for the one chromosome only run!

Tue Oct 11 16:49:40 2022 Checking annotation files for potential issues Tue Oct 11 16:49:40 2022 => Checking to see if GTF file can be properly imported from here: /fast/AG_Metzger/philipp/raw_data/genome_files/hg38.gtf Tue Oct 11 16:50:38 2022 => Imported 3150424 features from the GTF file Tue Oct 11 16:50:38 2022 => GTF file contains annotations for 60649 genes and 237013 transcripts Tue Oct 11 16:50:44 2022 => Imported 207877 transcripts from the cDNA fasta file at /fast/AG_Metzger/philipp/raw_data/genome_files/Homo_sapiens.GRCh38.cdna.all.fa Tue Oct 11 16:50:45 2022 => Number of transcript ids matching between the GTF file and the cDNA fasta file: 0 Warning message: Tue Oct 11 16:50:45 2022 => Couldn't match the transcript ids between the GTF file and the cDNA fasta file. This will cause problems in getting gene-level expression estimates when running Salmon.
However, the transcript-level expression estimation won't suffer. Possible reason is the source databases of these annotations are different. Another possible reason is the additional annotations on the transcript IDs in the fasta file: example entry for the transcript 'ENST00000202017' in GTF file is 'ENST00000202017.5 cdna chromosome:GRCh38:20:31944342:31952092 ...' in the Fasta File. If using annotations from ENSEMBL, the fasta file can be cleaned up with the following sed command: sed 's/(ENST[0-9]).[0-9].*/\1/g' > cDNA_file.cleaned.fasta Tue Oct 11 16:51:46 2022 => Imported 455 chromosomes/contigs from the DNA fasta files at /fast/AG_Metzger/philipp/raw_data/genome_files/hg38.fa Tue Oct 11 16:51:46 2022 => Number of chromosomes/contigs matching between the GTF file and the genome fasta file: 25 Tue Oct 11 16:51:46 2022 => List of chromosomes/contigs matching between the GTF file and the genome fasta file: chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr2 chr20 chr21 chr22 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chrM chrX chrY Tue Oct 11 16:51:47 2022 => Finished checking annotation files.

borauyar commented 1 year ago

There is no error with this run though? It seems to have finished succesfully.

Pinolinoo commented 1 year ago

Seems like the annotation check was succesful but then afterwards for the hisat index building it failed :/

borauyar commented 1 year ago

Then the error you showed me can't be the error you are talking about.

Could you maybe switch to using STAR on the cluster? Another user also had a similar problem with Hisat2 and switched to STAR, which solved their problem.

You can set the mapper type in settings file to 'star'.

Pinolinoo commented 1 year ago

I just reran with hisat2 and the error I get is the same in the complete log ( /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/.snakemake/log/2022-10-12T102111.776711.snakemake.log) and from that run the hisat2_index.log file looks like I posted above..

maybe we can take a look at it tomorrow if you are at bimsb? I also tried with star as a mapper but that also gave me errors :/ (different ones though)

Thank you very much for your help so far!

borauyar commented 1 year ago

If we are back to the hisat2 problem, then you would need to try out different resource settings. Please follow the discussion here to see how you can modify the resource requirements. https://groups.google.com/g/pigx/c/sbmdBZ7IOdI

Also try to use a single job to rule out the issue of memory sharing of multiple jobs.

Pinolinoo commented 1 year ago

I tried all the different resource requirements discussed in the link you sent but nothing worked for me. Star also doesn't work sadly. (see log attached)

[Thu Oct 13 10:44:01 2022] Error in rule trim_qc_reads_pe: jobid: 9 output: /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/trimmed_reads/190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R1.trimmed.R1.fq.gz, /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/trimmed_reads/190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R1.trimmed.R2.fq.gz, /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/QC/190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R1.pe.fastp.html, /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/QC/190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R1.pe.fastp.json log: /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/logs/trim_reads.190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R1.log (check log file(s) for error message) shell: /gnu/store/57rv4nz7fzi3dw4khzl30338l9pi5zvc-fastp-0.20.1/bin/fastp --adapter_sequence=AGATCGGAAGAGCACACGTCTGAACTCCAGTCA --adapter_sequence_r2=AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT --in1 /fast/AG_Metzger/philipp/raw_data/HD/short_reads_3d/all_reads/190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R1.fastq.gz --in2 /fast/AG_Metzger/philipp/raw_data/HD/short_reads_3d/all_reads/190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R2.fastq.gz --out1 /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/trimmed_reads/190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R1.trimmed.R1.fq.gz --out2 /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/trimmed_reads/190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R1.trimmed.R2.fq.gz -h /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/QC/190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R1.pe.fastp.html -j /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/QC/190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R1.pe.fastp.json >> /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/logs/trim_reads.190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R1.log 2>&1 (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!) cluster_jobid: Your job 5803358 ("snakejob.trim_qc_reads_pe.9.sh") has been submitted

Error executing rule trim_qc_reads_pe on cluster (jobid: 9, external: Your job 5803358 ("snakejob.trim_qc_reads_pe.9.sh") has been submitted, jobscript: /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/.snakemake/tmp.ty82369v/snakejob.trim_qc_reads_pe.9.sh). For error details see the cluster log and the log files of the involved rule(s). [Thu Oct 13 10:44:51 2022] Error in rule trim_qc_reads_pe: jobid: 25 output: /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/trimmed_reads/190702_C00101_0314_AHWLGLBCX2_E_1599_33_JG_R2_T1_2_S5_R2.trimmed.R1.fq.gz, /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/trimmed_reads/190702_C00101_0314_AHWLGLBCX2_E_1599_33_JG_R2_T1_2_S5_R2.trimmed.R2.fq.gz, /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/QC/190702_C00101_0314_AHWLGLBCX2_E_1599_33_JG_R2_T1_2_S5_R2.pe.fastp.html, /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/QC/190702_C00101_0314_AHWLGLBCX2_E_1599_33_JG_R2_T1_2_S5_R2.pe.fastp.json log: /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/logs/trim_reads.190702_C00101_0314_AHWLGLBCX2_E_1599_33_JG_R2_T1_2_S5_R2.log (check log file(s) for error message) shell: /gnu/store/57rv4nz7fzi3dw4khzl30338l9pi5zvc-fastp-0.20.1/bin/fastp --adapter_sequence=AGATCGGAAGAGCACACGTCTGAACTCCAGTCA --adapter_sequence_r2=AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT --in1 /fast/AG_Metzger/philipp/raw_data/HD/short_reads_3d/all_reads/190702_C00101_0314_AHWLGLBCX2_E_1599_33_JG_R2_T1_2_S5_R2.fastq.gz --in2 /fast/AG_Metzger/philipp/raw_data/HD/short_reads_3d/all_reads/190702_C00101_0314_AHWLGLBCX2_E_1599_33_JG_R2_T1_2_S5_R2.fastq.gz --out1 /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/trimmed_reads/190702_C00101_0314_AHWLGLBCX2_E_1599_33_JG_R2_T1_2_S5_R2.trimmed.R1.fq.gz --out2 /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/trimmed_reads/190702_C00101_0314_AHWLGLBCX2_E_1599_33_JG_R2_T1_2_S5_R2.trimmed.R2.fq.gz -h /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/QC/190702_C00101_0314_AHWLGLBCX2_E_1599_33_JG_R2_T1_2_S5_R2.pe.fastp.html -j /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/QC/190702_C00101_0314_AHWLGLBCX2_E_1599_33_JG_R2_T1_2_S5_R2.pe.fastp.json >> /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/logs/trim_reads.190702_C00101_0314_AHWLGLBCX2_E_1599_33_JG_R2_T1_2_S5_R2.log 2>&1 (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!) cluster_jobid: Your job 5803362 ("snakejob.trim_qc_reads_pe.25.sh") has been submitted

Error executing rule trim_qc_reads_pe on cluster (jobid: 25, external: Your job 5803362 ("snakejob.trim_qc_reads_pe.25.sh") has been submitted, jobscript: /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/.snakemake/tmp.ty82369v/snakejob.trim_qc_reads_pe.25.sh). For error details see the cluster log and the log files of the involved rule(s). [Thu Oct 13 10:48:21 2022] Finished job 17. 5 of 132 steps (4%) done [Thu Oct 13 10:49:51 2022] Finished job 41. 6 of 132 steps (5%) done ^[[Thu Oct 13 11:49:07 2022] Finished job 37. 7 of 132 steps (5%) done [Thu Oct 13 12:43:33 2022] Finished job 45. 8 of 132 steps (6%) done Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/.snakemake/log/2022-10-13T103439.351090.snakemake.log

borauyar commented 1 year ago

This time the error is not relevant to STAR though. It fails at the read trimming step. Can you please show the content of the log file? /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/logs/trim_reads.190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R1.log?

Pinolinoo commented 1 year ago

GLBCX2_E_1609_43_306_R3_T1_S15_R1.log Read1 before filtering: total reads: 22475169 total bases: 2220295332 Q20 bases: 2189938026(98.6327%) Q30 bases: 2164425791(97.4837%)

Read2 before filtering: total reads: 22475169 total bases: 2219810184 Q20 bases: 2169383944(97.7284%) Q30 bases: 2135674008(96.2098%)

Read1 after filtering: total reads: 22165324 total bases: 2189522947 Q20 bases: 2163670402(98.8193%) Q30 bases: 2140779527(97.7738%)

Read2 aftering filtering: total reads: 22165324 total bases: 2188438228 Q20 bases: 2154279361(98.4391%) Q30 bases: 2126590281(97.1739%)

Filtering result: reads passed filter: 44330648 reads failed due to low quality: 596810 reads failed due to too many N: 22798 reads failed due to too short: 82 reads with adapter trimmed: 161299 bases trimmed due to adapters: 1280750

Duplication rate: 3.41984%

Insert size peak (evaluated by paired-end reads): 123

JSON report: /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/QC/190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R1.pe.fastp.json HTML report: /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/QC/190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R1.pe.fastp.html

/gnu/store/57rv4nz7fzi3dw4khzl30338l9pi5zvc-fastp-0.20.1/bin/fastp --adapter_sequence=AGATCGGAAGAGCACACGTCTGAACTCCAGTCA --adapter_sequence_r2=AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT --in1 /fast/AG_Metzger/philipp/raw_data/HD/short_reads_3d/all_reads/190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R1.fastq.gz --in2 /fast/AG_Metzger/philipp/raw_data/HD/short_reads_3d/all_reads/190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R2.fastq.gz --out1 /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/trimmed_reads/190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R1.trimmed.R1.fq.gz --out2 /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/trimmed_reads/190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R1.trimmed.R2.fq.gz -h /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/QC/190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R1.pe.fastp.html -j /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/QC/190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R1.pe.fastp.json fastp v0.20.1, time used: 424 seconds

borauyar commented 1 year ago

I can't see any errors in this process either. It seems to be finished properly. Are you sure it is part of the same run that you submitted?

Can you restart the pipeline from where it left off and see if you get the same error?

Pinolinoo commented 1 year ago

yes I thought the same and it is very puzzeling. I am sure this is from the run that gave me the error since I deleted the output folder before starting the run.

How would I start the pipeline from a specific point?

borauyar commented 1 year ago

You run it again on the same folder, it will try to start from where it left off. Snakemake takes care of it.

Pinolinoo commented 1 year ago

ah so simply doing pgix-rnaseq -s ./settings.yaml ./sample_sheet.csv from the same folder?

borauyar commented 1 year ago

Yes, exactly.