Closed Pinolinoo closed 7 months ago
Hi @Pinolinoo
Can you show what is printed in the log file for hisat index?
output/logs/hisat2_index.log
Settings: Output files: "/fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/hisat2_index/GRCm38_index..ht2l" Line rate: 7 (line is 128 bytes) Lines per side: 1 (side is 128 bytes) Offset rate: 4 (one in 16) FTable chars: 10 Strings: unpacked Local offset rate: 3 (one in 8) Local fTable chars: 6 Local sequence length: 57344 Local sequence overlap between two consecutive indexes: 1024 Endianness: little Actual local endianness: little Sanity checking: disabled Assertions: disabled Random seed: 0 Sizeofs: void:8, int:4, long:8, size_t:8 Input files DNA, FASTA: /fast/AG_Metzger/philipp/raw_data/genome_files/hg38.fa Reading reference sizes Time reading reference sizes: 00:00:16 Calculating joined length Writing header Reserving space for joined string Joining reference sequences
Another user has just reported the same problem at the cluster and the problem seems to be a memory issue.
See discussion here: https://groups.google.com/g/pigx/c/sbmdBZ7IOdI
Could you please test the pipeline using one of the human chromosomes?
If it works, then you will need to update your settings file to ask for more resources for the hisat2_build function.
In the settings file, you can add a section, where you can modify the default memory requirement.
See the output of pigx-rnaseq --init
for the full list of options in settings.
execution:
submit-to-cluster: yes
jobs: 4
nice: 19
mem_mb: 128000
rules:
hisat2-build:
threads: 2
memory: 32000
How do I try it on one chromosome? Sadly when adding execution: submit-to-cluster: yes jobs: 4 nice: 19 mem_mb: 128000 rules: hisat2-build: threads: 2 memory: 32000
to the settings file it gives me the same error
You can download a human chrosomome and use that as your target genome sequence. http://ftp.ensembl.org/pub/release-107/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.chromosome.21.fa.gz
When I try it with the one chromosome as a input fasta file I get:
Error in rule check_annotation_files: jobid: 1 output: /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/input_annotation_stats.tsv log: /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/logs/check_annotation_files.log (check log file(s) for error message) shell: /gnu/store/zwlc915dpafababhm9wjfbdla919zvxm-r-minimal-4.2.1/bin/Rscript --vanilla /gnu/store/44g4dybxljnz0ad4108zrj5xiqy6grwj-pigx-rnaseq-0.1.0/libexec/pigx_rnaseq/scripts//validate_input_annotation.R /fast/AG_Metzger/philipp/raw_data/genome_files/hg38.gtf /fast/AG_Metzger/philipp/raw_data/genome_files/Homo_sapiens.GRCh38.cdna.all.fa /fast/AG_Metzger/philipp/raw_data/genome_files/Homo_sapiens.GRCh38.dna.chromosome.21.fa /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output >> /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/logs/check_annotation_files.log 2>&1 (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!) cluster_jobid: Your job 5802064 ("snakejob.check_annotation_files.1.sh") has been submitted
Error executing rule check_annotation_files on cluster (jobid: 1, external: Your job 5802064 ("snakejob.check_annotation_files.1.sh") has been submitted, jobscript: /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/.snakemake/tmp.w27kyeh3/snakejob.check_annotation_files.1.sh). For error details see the cluster log and the log files of the involved rule(s). Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/.snakemake/log/2022-10-11T151919.111011.snakemake.log
@Pinolinoo can you please show the content of output/logs/check_annotation_files.log
?
This now for a run on the whole genome again since I deleted the output folder for the one chromosome only run!
Tue Oct 11 16:49:40 2022 Checking annotation files for potential issues
Tue Oct 11 16:49:40 2022 => Checking to see if GTF file can be properly imported from here: /fast/AG_Metzger/philipp/raw_data/genome_files/hg38.gtf
Tue Oct 11 16:50:38 2022 => Imported 3150424 features from the GTF file
Tue Oct 11 16:50:38 2022 => GTF file contains annotations for 60649 genes and 237013 transcripts
Tue Oct 11 16:50:44 2022 => Imported 207877 transcripts from the cDNA fasta file at /fast/AG_Metzger/philipp/raw_data/genome_files/Homo_sapiens.GRCh38.cdna.all.fa
Tue Oct 11 16:50:45 2022 => Number of transcript ids matching between the GTF file and the cDNA fasta file: 0
Warning message:
Tue Oct 11 16:50:45 2022 => Couldn't match the transcript ids between the GTF file and the cDNA fasta file.
This will cause problems in getting gene-level expression estimates when running Salmon.
However, the transcript-level expression estimation won't suffer.
Possible reason is the source databases of these annotations are different.
Another possible reason is the additional annotations on the transcript IDs in the
fasta file: example entry for the transcript 'ENST00000202017' in GTF file is
'ENST00000202017.5 cdna chromosome:GRCh38:20:31944342:31952092 ...' in the Fasta File.
If using annotations from ENSEMBL, the fasta file can be cleaned up with the following
sed
command: sed 's/(ENST[0-9]).[0-9].*/\1/g'
There is no error with this run though? It seems to have finished succesfully.
Seems like the annotation check was succesful but then afterwards for the hisat index building it failed :/
Then the error you showed me can't be the error you are talking about.
Could you maybe switch to using STAR on the cluster? Another user also had a similar problem with Hisat2 and switched to STAR, which solved their problem.
You can set the mapper type in settings file to 'star'.
I just reran with hisat2 and the error I get is the same in the complete log ( /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/.snakemake/log/2022-10-12T102111.776711.snakemake.log) and from that run the hisat2_index.log file looks like I posted above..
maybe we can take a look at it tomorrow if you are at bimsb? I also tried with star as a mapper but that also gave me errors :/ (different ones though)
Thank you very much for your help so far!
If we are back to the hisat2 problem, then you would need to try out different resource settings. Please follow the discussion here to see how you can modify the resource requirements. https://groups.google.com/g/pigx/c/sbmdBZ7IOdI
Also try to use a single job to rule out the issue of memory sharing of multiple jobs.
I tried all the different resource requirements discussed in the link you sent but nothing worked for me. Star also doesn't work sadly. (see log attached)
[Thu Oct 13 10:44:01 2022] Error in rule trim_qc_reads_pe: jobid: 9 output: /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/trimmed_reads/190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R1.trimmed.R1.fq.gz, /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/trimmed_reads/190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R1.trimmed.R2.fq.gz, /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/QC/190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R1.pe.fastp.html, /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/QC/190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R1.pe.fastp.json log: /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/logs/trim_reads.190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R1.log (check log file(s) for error message) shell: /gnu/store/57rv4nz7fzi3dw4khzl30338l9pi5zvc-fastp-0.20.1/bin/fastp --adapter_sequence=AGATCGGAAGAGCACACGTCTGAACTCCAGTCA --adapter_sequence_r2=AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT --in1 /fast/AG_Metzger/philipp/raw_data/HD/short_reads_3d/all_reads/190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R1.fastq.gz --in2 /fast/AG_Metzger/philipp/raw_data/HD/short_reads_3d/all_reads/190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R2.fastq.gz --out1 /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/trimmed_reads/190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R1.trimmed.R1.fq.gz --out2 /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/trimmed_reads/190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R1.trimmed.R2.fq.gz -h /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/QC/190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R1.pe.fastp.html -j /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/QC/190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R1.pe.fastp.json >> /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/logs/trim_reads.190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R1.log 2>&1 (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!) cluster_jobid: Your job 5803358 ("snakejob.trim_qc_reads_pe.9.sh") has been submitted
Error executing rule trim_qc_reads_pe on cluster (jobid: 9, external: Your job 5803358 ("snakejob.trim_qc_reads_pe.9.sh") has been submitted, jobscript: /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/.snakemake/tmp.ty82369v/snakejob.trim_qc_reads_pe.9.sh). For error details see the cluster log and the log files of the involved rule(s). [Thu Oct 13 10:44:51 2022] Error in rule trim_qc_reads_pe: jobid: 25 output: /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/trimmed_reads/190702_C00101_0314_AHWLGLBCX2_E_1599_33_JG_R2_T1_2_S5_R2.trimmed.R1.fq.gz, /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/trimmed_reads/190702_C00101_0314_AHWLGLBCX2_E_1599_33_JG_R2_T1_2_S5_R2.trimmed.R2.fq.gz, /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/QC/190702_C00101_0314_AHWLGLBCX2_E_1599_33_JG_R2_T1_2_S5_R2.pe.fastp.html, /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/QC/190702_C00101_0314_AHWLGLBCX2_E_1599_33_JG_R2_T1_2_S5_R2.pe.fastp.json log: /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/logs/trim_reads.190702_C00101_0314_AHWLGLBCX2_E_1599_33_JG_R2_T1_2_S5_R2.log (check log file(s) for error message) shell: /gnu/store/57rv4nz7fzi3dw4khzl30338l9pi5zvc-fastp-0.20.1/bin/fastp --adapter_sequence=AGATCGGAAGAGCACACGTCTGAACTCCAGTCA --adapter_sequence_r2=AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT --in1 /fast/AG_Metzger/philipp/raw_data/HD/short_reads_3d/all_reads/190702_C00101_0314_AHWLGLBCX2_E_1599_33_JG_R2_T1_2_S5_R2.fastq.gz --in2 /fast/AG_Metzger/philipp/raw_data/HD/short_reads_3d/all_reads/190702_C00101_0314_AHWLGLBCX2_E_1599_33_JG_R2_T1_2_S5_R2.fastq.gz --out1 /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/trimmed_reads/190702_C00101_0314_AHWLGLBCX2_E_1599_33_JG_R2_T1_2_S5_R2.trimmed.R1.fq.gz --out2 /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/trimmed_reads/190702_C00101_0314_AHWLGLBCX2_E_1599_33_JG_R2_T1_2_S5_R2.trimmed.R2.fq.gz -h /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/QC/190702_C00101_0314_AHWLGLBCX2_E_1599_33_JG_R2_T1_2_S5_R2.pe.fastp.html -j /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/QC/190702_C00101_0314_AHWLGLBCX2_E_1599_33_JG_R2_T1_2_S5_R2.pe.fastp.json >> /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/logs/trim_reads.190702_C00101_0314_AHWLGLBCX2_E_1599_33_JG_R2_T1_2_S5_R2.log 2>&1 (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!) cluster_jobid: Your job 5803362 ("snakejob.trim_qc_reads_pe.25.sh") has been submitted
Error executing rule trim_qc_reads_pe on cluster (jobid: 25, external: Your job 5803362 ("snakejob.trim_qc_reads_pe.25.sh") has been submitted, jobscript: /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/.snakemake/tmp.ty82369v/snakejob.trim_qc_reads_pe.25.sh). For error details see the cluster log and the log files of the involved rule(s). [Thu Oct 13 10:48:21 2022] Finished job 17. 5 of 132 steps (4%) done [Thu Oct 13 10:49:51 2022] Finished job 41. 6 of 132 steps (5%) done ^[[Thu Oct 13 11:49:07 2022] Finished job 37. 7 of 132 steps (5%) done [Thu Oct 13 12:43:33 2022] Finished job 45. 8 of 132 steps (6%) done Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/.snakemake/log/2022-10-13T103439.351090.snakemake.log
This time the error is not relevant to STAR though. It fails at the read trimming step.
Can you please show the content of the log file?
/fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/logs/trim_reads.190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R1.log
?
GLBCX2_E_1609_43_306_R3_T1_S15_R1.log Read1 before filtering: total reads: 22475169 total bases: 2220295332 Q20 bases: 2189938026(98.6327%) Q30 bases: 2164425791(97.4837%)
Read2 before filtering: total reads: 22475169 total bases: 2219810184 Q20 bases: 2169383944(97.7284%) Q30 bases: 2135674008(96.2098%)
Read1 after filtering: total reads: 22165324 total bases: 2189522947 Q20 bases: 2163670402(98.8193%) Q30 bases: 2140779527(97.7738%)
Read2 aftering filtering: total reads: 22165324 total bases: 2188438228 Q20 bases: 2154279361(98.4391%) Q30 bases: 2126590281(97.1739%)
Filtering result: reads passed filter: 44330648 reads failed due to low quality: 596810 reads failed due to too many N: 22798 reads failed due to too short: 82 reads with adapter trimmed: 161299 bases trimmed due to adapters: 1280750
Duplication rate: 3.41984%
Insert size peak (evaluated by paired-end reads): 123
JSON report: /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/QC/190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R1.pe.fastp.json HTML report: /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/QC/190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R1.pe.fastp.html
/gnu/store/57rv4nz7fzi3dw4khzl30338l9pi5zvc-fastp-0.20.1/bin/fastp --adapter_sequence=AGATCGGAAGAGCACACGTCTGAACTCCAGTCA --adapter_sequence_r2=AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT --in1 /fast/AG_Metzger/philipp/raw_data/HD/short_reads_3d/all_reads/190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R1.fastq.gz --in2 /fast/AG_Metzger/philipp/raw_data/HD/short_reads_3d/all_reads/190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R2.fastq.gz --out1 /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/trimmed_reads/190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R1.trimmed.R1.fq.gz --out2 /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/trimmed_reads/190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R1.trimmed.R2.fq.gz -h /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/QC/190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R1.pe.fastp.html -j /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/QC/190702_C00101_0314_AHWLGLBCX2_E_1609_43_306_R3_T1_S15_R1.pe.fastp.json fastp v0.20.1, time used: 424 seconds
I can't see any errors in this process either. It seems to be finished properly. Are you sure it is part of the same run that you submitted?
Can you restart the pipeline from where it left off and see if you get the same error?
yes I thought the same and it is very puzzeling. I am sure this is from the run that gave me the error since I deleted the output folder before starting the run.
How would I start the pipeline from a specific point?
You run it again on the same folder, it will try to start from where it left off. Snakemake takes care of it.
ah so simply doing pgix-rnaseq -s ./settings.yaml ./sample_sheet.csv from the same folder?
Yes, exactly.
Hi I have come across a new problem. When executing I get the following error:
` Error in rule hisat2_index: jobid: 45 output: /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/hisat2_index/GRCm38_index.1.ht2l, /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/hisat2_index/GRCm38_index.2.ht2l, /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/hisat2_index/GRCm38_index.3.ht2l, /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/hisat2_index/GRCm38_index.4.ht2l, /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/hisat2_index/GRCm38_index.5.ht2l, /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/hisat2_index/GRCm38_index.6.ht2l, /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/hisat2_index/GRCm38_index.7.ht2l, /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/hisat2_index/GRCm38_index.8.ht2l log: /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/logs/hisat2_index.log (check log file(s) for error message) shell: /gnu/store/k2fwdwfp75nhnkf162f9nrxd0cyjjp4x-hisat2-2.2.1/bin/hisat2-build -f -p 2 --large-index /fast/AG_Metzger/philipp/raw_data/genome_files/hg38.fa /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/hisat2_index/GRCm38_index >> /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/logs/hisat2_index.log 2>&1 (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!) cluster_jobid: Your job 5801853 ("snakejob.hisat2_index.45.sh") has been submitted
Error executing rule hisat2_index on cluster (jobid: 45, external: Your job 5801853 ("snakejob.hisat2_index.45.sh") has been submitted, jobscript: /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/.snakemake/tmp.df6a1v0e/snakejob.hisat2_index.45.sh). For error details see the cluster log and the log files of the involved rule(s). ´
my settings.yaml looks like: ` locations: reads-dir: /fast/AG_Metzger/philipp/raw_data/HD/short_reads_3d/all_reads/ output-dir: /fast/AG_Metzger/philipp/results/pigx/RNA-Seq/HD_organoids/output/ genome-fasta: /fast/AG_Metzger/philipp/raw_data/genome_files/hg38.fa cdna-fasta: /fast/AG_Metzger/philipp/raw_data/genome_files/Homo_sapiens.GRCh38.cdna.all.fa gtf-file: /fast/AG_Metzger/philipp/raw_data/genome_files/hg38.gtf
organism: hsapiens
DEanalyses:
names of analyses can be anything but they have to be unique for each combination of case control group comparisons.
analysis1:
if multiple sample names are provided, they must be separated by comma
execution: submit-to-cluster: yes jobs: 40 `