Closed luongphekidz07 closed 4 years ago
Hi, there seem to be two issues here:
STAR
segfaults during mappingtximeta
fails when reading the Salmon
outputFor the first issue, I wonder if the problem is that you have a very small genome. See for example https://github.com/alexdobin/STAR/issues/573#issuecomment-466829232 - you need to set some additional parameters when indexing small genomes. In ARMOR
, you can do this in the config file, by changing the additional_star_index
parameter.
For the second issue, can you have a look at the file /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/tximeta_se.Rout
- it should indicate what the problem is.
Dear Professor,
I would like to send you the error messages: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/tximeta_se.Rout R version 4.0.0 (2020-04-24) -- "Arbor Day" Copyright (C) 2020 The R Foundation for Statistical Computing Platform: x86_64-conda_cos6-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R.
args <- (commandArgs(trailingOnly = TRUE)) for (i in seq_len(length(args))) {
- eval(parse(text = args[[i]]))
- }
suppressPackageStartupMessages({
- library(dplyr)
- library(tximport)
- library(tximeta)
- library(SingleCellExperiment)
- })
print(salmondir) [1] "/home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/salmon" print(json) [1] "/home/leo/ARMOR/E109_E453/E109_E453_saslmonindex.sidx.json" print(metafile) [1] "/home/leo/ARMOR/E109_E453/metadata_E109_E453.txt" print(outrds) [1] "/home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/tximeta_se.rds" print(annotation) [1] "Ensembl" print(organism) [1] "Ecoli"
Load json linkedTxome
loadLinkedTxome(json) saving linkedTxome in bfc (first time)
Read metadata
metadata <- read.delim(metafile, header = TRUE, as.is = TRUE, sep = "\t")
List Salmon directories
salmonfiles <- paste0(salmondir, "/", metadata$names, "/quant.sf") names(salmonfiles) <- metadata$names
Add file column to metadata and import annotated abundances
In transcript level
coldata <- cbind(metadata, files = salmonfiles, stringsAsFactors = FALSE) st <- tximeta::tximeta(coldata) importing quantifications reading in files with read_tsv 1 2 found matching linked transcriptome: [ Ensembl - Ecoli NA - release 1 ] useHub=TRUE: checking for EnsDb via 'AnnotationHub' using temporary cache /tmp/RtmpXow0vE/BiocFileCache snapshotDate(): 2020-04-27 did not find matching EnsDb via 'AnnotationHub' building EnsDb with 'ensembldb' package Importing GTF file ... OK Processing metadata ... OK Processing genes ... Attribute availability: o gene_id ... OK o gene_name ... Nope o entrezid ... Nope o gene_biotype ... Nope OK Processing transcripts ... Attribute availability: o transcript_id ... OK o gene_id ... OK o source ... OK I can't find type=='CDS'! The resulting database will lack CDS information! Error in FUN(X[[i]], ...) : Column 'tx_cds_seq_start' is not of type integer! Calls:
-> getTxDb Execution halted
Currently, I constructed a complete bacterial genome my own, I do not know how to create the gtf file for my strain to make it fit with ARMOR. I am using the RAST server for the annotation process but RAST gtf output file was different from the reference gtf file in the ARMOR reference directory. I tried to fix part of it, but it seems difficult to me to make a good gtf file and txome file. When you have time, could you give me some instructions on how to create a gtf file and txome file to fit ARMOR input?
Thank you very much for your precious timen and valuable help,
Kind regards, Le Phuong.
Thanks for the report. ARMOR
is really intended for analysis of experiments where you have an annotation obtained from either Gencode or Ensembl. If you specify 'Ensembl' as the annotation source, tximeta
will try to generate an EnsDb
object (or find one from AnnotationHub
). I don't have experience with RAST annotations, so I don't know how they differ from Ensembl/Gencode ones and whether they can be reconciled. You could try to set 'annotation' in the config file to e.g. 'other', in which case tximeta
should not try to create the EnsDb
object. However, you should be aware that ARMOR
is not really tested for these situations.
Dear Expert,
Thank you very much for your suggestions, I tried to change the settings to "other". As you predicted, there are still errors. I would like to describe the situation in prokaryotic annotation pipeline. Currently, there are multiple tools for prokaryotic annotations such as:
Additionally, as you mentioned above, there are issues related to STAR and ximeta. The bacteria genome usually ranges from 4 Mbps to 5 Mbps (for Escherichia coli and Klebsiella pneumoniae). But there are plasmid genome inside bacteria as well which usually ranges from 2kbp to 200kbp. I would like ask how I can change the default settings inside ARMOR, currently I do not know how to change those settings according to your suggestions. I hope that I can apply ARMOR in my research,
Thank you very much for your precious time and kind instructions,
Kind regards, Le Phuong.
As you predicted, there are still errors.
Please be more precise - what type of error? The same as before, or a different one?
Regarding the settings for STAR
, you need to check the size of the genome that you provide (the fasta file), and derive the appropriate settings from there (according to the post I linked to above). Once you have the settings, you can add them to the additional_star_index
parameter in the ARMOR
config file. For example:
additional_star_index: "--genomeSAindexNbases 9"
if your genome is 1M bases (see the linked post for how to get the proper value for your genome).
Dear Expert,
Here is the messages that I received when I changed to "other": (snakemake) leo@leo:~/ARMOR$ snakemake --use-conda --core 10 <\Building DAG of jobs... Using shell: /bin/bash Provided cores: 10 Rules claiming more threads will be scaled down. Job counts: count jobs 1 DRIMSeq 1 all 1 edgeR 1 multiqc 1 shiny 1 tximeta 6
[Wed Jul 1 21:55:47 2020] rule tximeta: input: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/pkginstall_state.txt, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/salmon/E109_4_1/quant.sf, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/salmon/E453_4_1/quant.sf, /home/leo/ARMOR/E109_E453/metadata_E109_E453.txt, /home/leo/ARMOR/E109_E453/E109_E453_saslmonindex.sidx/versionInfo.json, /home/leo/ARMOR/E109_E453/E109_E453_saslmonindex.sidx.json, scripts/run_tximeta.R output: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/tximeta_se.rds log: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/tximeta_se.Rout jobid: 19 benchmark: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/benchmarks/tximeta_se.txt
[Wed Jul 1 21:55:47 2020] rule multiqc: input: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FastQC/E109_4_1_R1_fastqc.zip, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FastQC/E453_4_1_R1_fastqc.zip, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FastQC/E109_4_1_R2_fastqc.zip, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FastQC/E453_4_1_R2_fastqc.zip, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/salmon/E109_4_1/quant.sf, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/salmon/E453_4_1/quant.sf, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FASTQtrimmed/E109_4_1_R1_val_1.fq.gz, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FASTQtrimmed/E453_4_1_R1_val_1.fq.gz, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FASTQtrimmed/E109_4_1_R2_val_2.fq.gz, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FASTQtrimmed/E453_4_1_R2_val_2.fq.gz, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FastQC/E109_4_1_R1_val_1_fastqc.zip, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FastQC/E453_4_1_R1_val_1_fastqc.zip, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FastQC/E109_4_1_R2_val_2_fastqc.zip, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FastQC/E453_4_1_R2_val_2_fastqc.zip output: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/MultiQC/multiqc_report.html log: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/logs/multiqc.log jobid: 1 benchmark: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/benchmarks/multiqc.txt
Activating conda environment: /home/leo/ARMOR/.snakemake/conda/10da7374 Activating conda environment: /home/leo/ARMOR/.snakemake/conda/9f6359b5 /home/leo/ARMOR/.snakemake/conda/10da7374/lib/python3.6/site-packages/multiqc/utils/config.py:45: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. configs = yaml.load(f) /home/leo/ARMOR/.snakemake/conda/10da7374/lib/python3.6/site-packages/multiqc/utils/config.py:51: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. sp = yaml.load(f) /home/leo/ARMOR/.snakemake/conda/10da7374/lib/python3.6/site-packages/multiqc/utils/config.py:45: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. configs = yaml.load(f) /home/leo/ARMOR/.snakemake/conda/10da7374/lib/python3.6/site-packages/multiqc/utils/config.py:51: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. sp = yaml.load(f) [WARNING] multiqc : MultiQC Version v1.9 now available! [INFO ] multiqc : This is MultiQC v1.7 (1b88491) [INFO ] multiqc : Template : default [INFO ] multiqc : Searching '/home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FastQC' [INFO ] multiqc : Searching '/home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/salmon' [INFO ] multiqc : Searching '/home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FASTQtrimmed' Searching 58 files.. [####################################] 100% [INFO ] salmon : Found 2 meta reports [INFO ] salmon : Found 2 fragment length distributions [INFO ] cutadapt : Found 4 reports [INFO ] fastqc : Found 8 reports [INFO ] multiqc : Compressing plot data [INFO ] multiqc : Report : ../ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/MultiQC/multiqc_report.html [INFO ] multiqc : Data : ../ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/MultiQC/multiqc_data [INFO ] multiqc : MultiQC complete [Wed Jul 1 21:55:55 2020] Finished job 1. 1 of 6 steps (17%) done [Wed Jul 1 21:59:02 2020] Error in rule tximeta: jobid: 19 output: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/tximeta_se.rds log: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/tximeta_se.Rout (check log file(s) for error message) conda-env: /home/leo/ARMOR/.snakemake/conda/9f6359b5 shell: R CMD BATCH --no-restore --no-save "--args salmondir='/home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/salmon' json='/home/leo/ARMOR/E109_E453/E109_E453_saslmonindex.sidx.json' metafile='/home/leo/ARMOR/E109_E453/metadata_E109_E453.txt' outrds='/home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/tximeta_se.rds' annotation='Other' organism='Ecoli'" scripts/run_tximeta.R /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/tximeta_se.Rout (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Error! The Snakemake workflow aborted.
Complete log: /home/leo/ARMOR/.snakemake/log/2020-07-01T215547.504384.snakemake.log>
I rerun the with --genomeSAindexNbases 10, (I calculated with 4.4Mbps genome size: log2(4,400,000)/2-1=10.03). But there are still errors:
<Activating conda environment: /home/leo/ARMOR/.snakemake/conda/10da7374 Jul 02 00:30:37 ..... started STAR run Jul 02 00:30:37 ..... loading genome Jul 02 00:30:38 ..... started mapping /bin/bash: line 1: 32212 Segmentation fault (core dumped) STAR --genomeDir /home/leo/ARMOR/E109_E453/E109_E453_starindex.idx --readFilesIn /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FASTQtrimmed/E109_4_1_R1_val_1.fq.gz /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FASTQtrimmed/E109_4_1_R2_val_2.fq.gz --runThreadN 10 --outFileNamePrefix /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/STAR/E109_4_1/E109_41 --outSAMtype BAM SortedByCoordinate --readFilesCommand gunzip -c [Thu Jul 2 00:30:39 2020] Error in rule starPE: jobid: 22 output: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/STAR/E109_4_1/E109_4_1_Aligned.sortedByCoord.out.bam log: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/logs/STAR_E109_4_1.log (check log file(s) for error message) conda-env: /home/leo/ARMOR/.snakemake/conda/10da7374 shell: echo 'STAR version: ' > /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/logs/STAR_E109_4_1.log; STAR --version >> /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/logs/STAR_E109_4_1.log; STAR --genomeDir /home/leo/ARMOR/E109_E453/E109_E453_starindex.idx --readFilesIn /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FASTQtrimmed/E109_4_1_R1_val_1.fq.gz /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FASTQtrimmed/E109_4_1_R2_val_2.fq.gz --runThreadN 10 --outFileNamePrefix /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/STAR/E109_4_1/E109_41 --outSAMtype BAM SortedByCoordinate --readFilesCommand gunzip -c (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Removing output files of failed job starPE since they might be corrupted: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/STAR/E109_4_1/E109_4_1_Aligned.sortedByCoord.out.bam Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Error! The Snakemake workflow aborted. Complete log: /home/leo/ARMOR/.snakemake/log/2020-07-02T001832.033298.snakemake.log (snakemake) leo@leo:~/ARMOR$>
I would like to receive your advice on these issues,
Thank you very much for your precious time and kind help,
Kind regards, Le Phuong.
You still need to check the .Rout
file (/home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/tximeta_se.Rout
) for the tximeta
error, the output above just says that something went wrong in this step.
For the STAR
error, was the index recreated? Snakemake
does not automatically rerun a rule if only the parameters of the call changed (in this case, the input files did not change). Otherwise, try to delete the STAR
index and run it again.
Dear Expert,
Thank you very much for your instructions, I deleted the STAR index and rerun. This time it showed fatal error due to lack of memory
<Activating conda environment: /home/leo/ARMOR/.snakemake/conda/10da7374 Jul 02 02:15:50 ..... started STAR run Jul 02 02:15:50 ..... loading genome Jul 02 02:15:51 ..... started mapping Jul 02 02:16:50 ..... finished mapping Jul 02 02:16:50 ..... started sorting BAM
EXITING because of fatal ERROR: not enough memory for BAM sorting: SOLUTION: re-run STAR with at least --limitBAMsortRAM 1188211575 Jul 02 02:16:50 ...... FATAL ERROR, exiting [Thu Jul 2 02:16:50 2020] Error in rule starPE: jobid: 22 output: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/STAR/E109_4_1/E109_4_1_Aligned.sortedByCoord.out.bam log: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/logs/STAR_E109_4_1.log (check log file(s) for error message) conda-env: /home/leo/ARMOR/.snakemake/conda/10da7374 shell: echo 'STAR version: ' > /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/logs/STAR_E109_4_1.log; STAR --version >> /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/logs/STAR_E109_4_1.log; STAR --genomeDir /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/E109_E453_starindex.idx --readFilesIn /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FASTQtrimmed/E109_4_1_R1_val_1.fq.gz /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FASTQtrimmed/E109_4_1_R2_val_2.fq.gz --runThreadN 16 --outFileNamePrefix /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/STAR/E109_4_1/E109_41 --outSAMtype BAM SortedByCoordinate --readFilesCommand gunzip -c (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Removing output files of failed job starPE since they might be corrupted: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/STAR/E109_4_1/E109_4_1_Aligned.sortedByCoord.out.bam Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Error! The Snakemake workflow aborted. Complete log: /home/leo/ARMOR/.snakemake/log/2020-07-02T020310.223778.snakemake.log>
tried to find on the parameter on the config file, but I can not find this parameter. I would like to ask how I can edit the parameter "--limitBAMsortRAM 1188211575".
Thank you very much for your precious time and your kind help,
Kind regards, Le Phuong.
This is an argument of the STAR
alignment step, so you need to add it to additional_star_align: ""
in the same way as you did above with additional_star_index: ""
to modify the indexing step.
Dear Expert, Thank you so much for your precious time and kind help, It is late in Korea now around 3AM. I will rerun the pipeline again and will let you know the results tomorrow,
Thank you so much, Kind regards, Le Phuong
Dear Expert,
I would like to update the errors after adding the addtional_star_align:"--limitBAMsortRAM 1288211575"
<Activating conda environment: /home/leo/ARMOR/.snakemake/conda/10da7374 Jul 02 09:12:02 ..... started STAR run Jul 02 09:12:02 ..... loading genome Jul 02 09:12:03 ..... started mapping
BAMoutput.cpp:27:BAMoutput: exiting because of OUTPUT FILE error: could not create output file /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/STAR/E109_4_1/E109_4_1__STARtmp//BAMsort/20/16 SOLUTION: check that the path exists and you have write permission for this file. Also check ulimit -n and increase it to allow more open files.
Jul 02 09:12:04 ...... FATAL ERROR, exiting [Thu Jul 2 09:12:04 2020] Error in rule starPE: jobid: 22 output: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/STAR/E109_4_1/E109_4_1_Aligned.sortedByCoord.out.bam log: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/logs/STAR_E109_4_1.log (check log file(s) for error message) conda-env: /home/leo/ARMOR/.snakemake/conda/10da7374 shell: echo 'STAR version: ' > /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/logs/STAR_E109_4_1.log; STAR --version >> /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/logs/STAR_E109_4_1.log; STAR --genomeDir /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/E109_E453_starindex.idx --readFilesIn /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FASTQtrimmed/E109_4_1_R1_val_1.fq.gz /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FASTQtrimmed/E109_4_1_R2_val_2.fq.gz --runThreadN 40 --outFileNamePrefix /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/STAR/E109_4_1/E109_41 --outSAMtype BAM SortedByCoordinate --readFilesCommand gunzip -c --limitBAMsortRAM 1288211575 (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Removing output files of failed job starPE since they might be corrupted: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/STAR/E109_4_1/E109_4_1_Aligned.sortedByCoord.out.bam Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Error! The Snakemake workflow aborted. Complete log: /home/leo/ARMOR/.snakemake/log/2020-07-02T084627.450660.snakemake.log>
I tried to use several commands such as:
After each command, I rerun again, but there are no changes in the error messages. When you have time, I would like to receive your instructions on solving the problem,
Thank you very much for your precious time and kind help,
Kind regards, Le Phuong.
It may be related to this: https://github.com/alexdobin/STAR/issues/528. Try to change the ulimit or reduce the number of threads as suggested there. These are not really ARMOR
errors (they would happen even if you ran STAR
directly, outside of ARMOR
), so I would suggest also checking the issues previously reported (and solved) for STAR
.
Dear Expert, I understood that I have to change the ulimit parameter. This may be my naive question, I would like to ask whether it is possible to change the parameter of ulimit in the config file of ARMOR. Or is it possible to change the parameter in the ARMOR snakemake file.
Thank you very much for your kind help and your patience,
Kind regards, Le Phuong.
I have not tried this, but I suppose you could add a line in the beginning of the shell
part of the appropriate rule in the Snakefile
, setting the ulimit
to the desired value for the current session. There is no parameter for changing this in the config file, since this is more of a system setting than directly related to the workflow.
Dear Expert,
Thank you so much for your suggestions. I am not expert in bioinformatic field, I do not know how to add the shell part in the snakefile, but I will try furthermore. According to your advice, I reduced the threads from 40 to 10 and the problem were solved.
However, now I have other error messages:
<Searching 70 files.. [############################--------] 78%[Thu Jul 2 17:28:45 2020] Finished job 26. 6 of 13 steps (46%) done
[Thu Jul 2 17:28:45 2020] rule edgeR: input: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/pkginstall_state.txt, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/tximeta_se.rds, scripts/run_render.R, scripts/edgeR_dge.Rmd output: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/edgeR_dge.html, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/edgeR_dge.rds log: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/run_dge_edgeR.Rout jobid: 25 benchmark: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/benchmarks/run_dge_edgeR.txt
Searching 70 files.. [####################################] 100% [INFO ] salmon : Found 2 meta reports [INFO ] salmon : Found 2 fragment length distributions [INFO ] star : Found 2 reports [INFO ] cutadapt : Found 4 reports Activating conda environment: /home/leo/ARMOR/.snakemake/conda/9f6359b5 [INFO ] fastqc : Found 8 reports [INFO ] multiqc : Compressing plot data [INFO ] multiqc : Report : ../ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/MultiQC/multiqc_report.html [INFO ] multiqc : Data : ../ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/MultiQC/multiqc_data [INFO ] multiqc : MultiQC complete [Thu Jul 2 17:28:47 2020] Finished job 1. 7 of 13 steps (54%) done [Thu Jul 2 17:28:49 2020] Error in rule edgeR: jobid: 25 output: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/edgeR_dge.html, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/edgeR_dge.rds log: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/run_dge_edgeR.Rout (check log file(s) for error message) conda-env: /home/leo/ARMOR/.snakemake/conda/9f6359b5 shell: R CMD BATCH --no-restore --no-save "--args se='/home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/tximeta_se.rds' organism='Ecoli' design='~0+celline' contrast='celline1-celline2,celline2-celline1' genesets='H,C5' rmdtemplate='scripts/edgeR_dge.Rmd' outputdir='/home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR' outputfile='edgeR_dge.html'" scripts/run_render.R /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/run_dge_edgeR.Rout (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
[Thu Jul 2 17:29:26 2020] Finished job 19. 8 of 13 steps (62%) done [Thu Jul 2 17:29:39 2020] Finished job 18. 9 of 13 steps (69%) done Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Error! The Snakemake workflow aborted. Complete log: /home/leo/ARMOR/.snakemake/log/2020-07-02T172518.020533.snakemake.log>
I checked the file /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/run_dge_edgeR.Rout, and it showed this message:
Arguments that are only used for some of the reports
if (exists("organism")) {
- print(organism)
- } else {
- organism <- NULL
- } [1] "Ecoli"
if (exists("design")) {
- print(design)
- } else {
- design <- NULL
- } [1] "~0+celline"
if (exists("contrast")) {
- contrast <- strsplit(gsub(" ","",contrast), ",")[[1]]
- print(contrast)
- } else {
- contrast <- NULL
- } [1] "celline1-celline2" "celline2-celline1"
if (exists("genesets")) {
- genesets <- strsplit(gsub(" ","",genesets), ",")[[1]]
- print(genesets)
- } else {
- genesets <- NULL
- } [1] "H" "C5"
if (exists("gtffile")) {
- print(gtffile)
- } else {
- gtffile <- NULL
- }
if (exists("ncores")) {
- ncores <- as.numeric(ncores)
- if(is.na(ncores))
- ncores <- 1
- print(ncores)
- } else {
- ncores <- 1
- }
if (exists("bigwigdir")) {
- bigwigdir <- normalizePath(bigwigdir)
- print(bigwigdir)
- } else {
- bigwigdir <- NULL
- }
source("scripts/generate_report.R")
generateReport(se = se, organism = organism, gtffile = gtffile,
- contrast = contrast, design = design, genesets = genesets,
- bigwigdir = bigwigdir, rmdTemplate = rmdtemplate,
- outputDir = outputdir, outputFile = outputfile, ncores = ncores,
- forceOverwrite = TRUE, showCode = TRUE) Error in generateReport(se = se, organism = organism, gtffile = gtffile, : organism must be one of the organisms listed in msigdbr::msigdbr_show_species() Execution halted>
I would like to to ask what I should do to solve the problem,
Thank you very much for your valuble time and kind help,
Kind regards, Le Phuong.
Error in generateReport(se = se, organism = organism, gtffile = gtffile, : organism must be one of the organisms listed in msigdbr::msigdbr_show_species()
This tells you that you can't perform gene set analysis for E.coli, since it's not included among the supported species. You need to set this to False: https://github.com/csoneson/ARMOR/blob/1b88491ae4e372565ab60c60a7a69fd842bee53f/config.yaml#L127
Dear Expert,
Thank you very much for your instructions. I changed "run_camera: False" and rerun. Here is the message error: <22 of 28 steps (79%) done [Thu Jul 2 21:13:14 2020] Error in rule edgeR: jobid: 25 output: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/edgeR_dge.html, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/edgeR_dge.rds log: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/run_dge_edgeR.Rout (check log file(s) for error message) conda-env: /home/leo/ARMOR/.snakemake/conda/9f6359b5 shell: R CMD BATCH --no-restore --no-save "--args se='/home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/tximeta_se.rds' organism='Ecoli' design='~0+celline' contrast='celline1-celline2,celline2-celline1' rmdtemplate='scripts/edgeR_dge.Rmd' outputdir='/home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR' outputfile='edgeR_dge.html'" scripts/run_render.R /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/run_dge_edgeR.Rout (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
[Thu Jul 2 21:13:42 2020] Finished job 19. 23 of 28 steps (82%) done [Thu Jul 2 21:13:58 2020] Finished job 18. 24 of 28 steps (86%) done Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Error! The Snakemake workflow aborted. Complete log: /home/leo/ARMOR/.snakemake/log/2020-07-02T204357.942352.snakemake.log>
Here is the Rout error message:
<> source("scripts/generate_report.R")
generateReport(se = se, organism = organism, gtffile = gtffile,
- contrast = contrast, design = design, genesets = genesets,
- bigwigdir = bigwigdir, rmdTemplate = rmdtemplate,
- outputDir = outputdir, outputFile = outputfile, ncores = ncores,
- forceOverwrite = TRUE, showCode = TRUE) Quitting from lines 232-234 (edgeR_dge.Rmd) Error in eval(ej, envir = levelsenv) : object 'celline1' not found Calls: generateReport ... eval -> as.data.frame -> makeContrasts -> eval -> eval Execution halted>
I would like to ask what is the "celline1". In my case, is it my control bacterial strain? I would like to send you the metadata.tsv: names type celline treatment E109_4_1 PE 1 None E453_4_1 PE 1 Yes I just included one set of data. Is it the correct way to make metadata.tsv file? I would like to receive your instructions
Thank you very much for your patience and your kind help,
Kind regards, Le Phuong.
Yes, metadata.tsv should be a tab-delimited text file. You need to specify the design and contrasts appropriately in the configuration file. These are specified in the standard format used by (e.g.) edgeR
- if you are uncertain about what that means I would suggest looking into the extensive edgeR
user guide (there are lots of great examples of different models). ARMOR
can do some checks of the specified parameters (including the design and contrasts) via the checkinputs
rule (see https://github.com/csoneson/ARMOR/wiki/Setting-up-the-DGE-analysis).
Dear Expert,
Thank you very much for your instructions and your recommendations. I am still reading the instructions from edgeR wiki. I make the metadata file similar to edgE examples with tab delimited style. Then, I used the command line to check inputs, but it still has the errors: can not find cellline1. Does it mean that I have to change cellline1 to "None" and cellline2 to "Yes"? I am sorry this question looks stupid, but I cannot find the reason
Here is the error message
<Activating conda environment: /home/leo/ARMOR/.snakemake/conda/9f6359b5 [Fri Jul 3 03:23:08 2020] Error in rule checkinputs: jobid: 0 output: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/check_input.txt log: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/check_input.Rout (check log file(s) for error message) conda-env: /home/leo/ARMOR/.snakemake/conda/9f6359b5 shell: R CMD BATCH --no-restore --no-save "--args metafile='/home/leo/ARMOR/E109_E453/metadata_E109_E453.txt' design='~0+celline' contrast='celline1-celline2,celline2-celline1' outFile='/home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/check_input.txt' gtf='/home/leo/ARMOR/E109_E453/E109.gtf' genome='/home/leo/ARMOR/E109_E453/E109.fa' fastqdir='/home/leo/ECOPAIR_RNAseq_rawdata' fqsuffix='fq' fqext1='R1' fqext2='R2' txome='/home/leo/ARMOR/E109_E453/E109_gene.fa' run_camera='False' organism='Ecoli' annotation='Other'" scripts/check_input.R /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/check_input.Rout; cat /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/check_input.txt
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Error! The Snakemake workflow aborted. Complete log: /home/leo/ARMOR/.snakemake/log/2020-07-03T032306.257087.snakemake.log>
Here is the error of the run_dge_edgeR.Rout:
source("scripts/generate_report.R")
generateReport(se = se, organism = organism, gtffile = gtffile,
- contrast = contrast, design = design, genesets = genesets,
- bigwigdir = bigwigdir, rmdTemplate = rmdtemplate,
- outputDir = outputdir, outputFile = outputfile, ncores = ncores,
- forceOverwrite = TRUE, showCode = TRUE) Quitting from lines 232-234 (edgeR_dge.Rmd) Error in eval(ej, envir = levelsenv) : object 'celline1' not found Calls: generateReport ... eval -> as.data.frame -> makeContrasts -> eval -> eval Execution halted
I think it is a very last problem that I am facing now.
Thank you very much for your patience and your kind help, Kind regards, Le Phuong.
Ok, so I'm going to guess that the problem is that the cell line column is interpreted as a numeric rather than a categorical variable. Could you change to e.g. C1/C2 instead of 1/2, and then do cellineC1-cellineC2
?
Dear Expert,
I have changed to C1/C2 in the metadata.tsv. I also changed cellineC1, cellineC2 in the yaml file. I rerun the whole process. However, there is another error:
<Error in rule edgeR: jobid: 43 output: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/edgeR_dge.html, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/edgeR_dge.rds log: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/run_dge_edgeR.Rout (check log file(s) for error message) conda-env: /home/leo/ARMOR/.snakemake/conda/9f6359b5 shell: R CMD BATCH --no-restore --no-save "--args se='/home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/tximeta_se.rds' organism='Ecoli' design='~0+celline' contrast='cellineC1-cellineC2,cellineC2-cellineC1' rmdtemplate='scripts/edgeR_dge.Rmd' outputdir='/home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR' outputfile='edgeR_dge.html'" scripts/run_render.R /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/run_dge_edgeR.Rout (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)>
I checked the run_dge_edgeR.Rout file:
< if (exists("bigwigdir")) {
source("scripts/generate_report.R")
generateReport(se = se, organism = organism, gtffile = gtffile,
gene_name
doesn't exist.
Backtrace:
█
According to your suggestion, the reason is truely due to the numeric rather than the categorical variable. This time, the problem is due to the empty gene name column. I need to spend time to think how to make the gene name column.
Thank you very much for your valuable time, patience and kind help, Kind regards, Le Phuong.
Yes, the edgeR
script assumes that the data object has an annotation column gene_name
(which it will have if you are using a Gencode or Ensembl annotation). In your case, you could create one e.g. by adding a line here: https://github.com/csoneson/ARMOR/blob/1b88491ae4e372565ab60c60a7a69fd842bee53f/scripts/run_tximeta.R#L48
defining what you want the gene_name
to be. For example
rowData(sg)$gene_name <- rowData(sg)$gene_id
depending on which columns you have in your object.
Dear Expert,
Thank you very much for your instructions. I added "rowData(sg)$gene_name <- rowData(sg)$gene_id" in line 48. And rerun again, then it showed another message:
<Activating conda environment: /home/leo/ARMOR/.snakemake/conda/9f6359b5 [Sat Jul 4 01:05:00 2020] Finished job 43. 2 of 5 steps (40%) done
[Sat Jul 4 01:05:00 2020] rule DRIMSeq: input: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/pkginstall_state.txt, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/edgeR_dge.rds, scripts/run_render.R, scripts/DRIMSeq_dtu.Rmd output: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/DRIMSeq_dtu.html, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/DRIMSeq_dtu.rds log: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/run_dtu_drimseq.Rout jobid: 36 benchmark: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/benchmarks/run_dtu_drimseq.txt threads: 10
Activating conda environment: /home/leo/ARMOR/.snakemake/conda/9f6359b5 [Sat Jul 4 01:05:13 2020] Error in rule DRIMSeq: jobid: 36 output: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/DRIMSeq_dtu.html, /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/DRIMSeq_dtu.rds log: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/run_dtu_drimseq.Rout (check log file(s) for error message) conda-env: /home/leo/ARMOR/.snakemake/conda/9f6359b5 shell: R CMD BATCH --no-restore --no-save "--args se='/home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/edgeR_dge.rds' design='~0+celline' contrast='cellineC1-cellineC2,cellineC2-cellineC1' ncores='10' rmdtemplate='scripts/DRIMSeq_dtu.Rmd' outputdir='/home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR' outputfile='DRIMSeq_dtu.html'" scripts/run_render.R /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/run_dtu_drimseq.Rout (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Error! The Snakemake workflow aborted.>
I checked the file: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/run_dtu_drimseq.Rout. And then it showed the message:
<> if (exists("bigwigdir")) {
source("scripts/generate_report.R")
generateReport(se = se, organism = organism, gtffile = gtffile,
Here are the lines 111-114:
d <- dmFilter(d, min_samps_gene_expr = 3, min_samps_feature_expr = 3,
min_gene_expr = 10, min_feature_expr = 5)
plotData(d)
When you have time, I would like to receive your instructions on the next step,
Thank you very much for your valuable time and kind help,
Kind regards, Le Phuong.
Error in dmDS_filter(counts = x@counts, min_samps_gene_expr = min_samps_gene_expr, :
!No genes left after filtering!
This tells you what the error is - there are no genes left after the filtering that you have applied. This is not an ARMOR
error - you may need to adjust the DRIMSeq
filtering parameters to fit your data set. Please keep in mind that you should not just run the pipeline blindly - it is important that you understand what the different steps do in order to interpret the output properly.
I think we have solved the ARMOR
-related problems here, so I'm closing this issue. Feel free to reopen if that's not the case.
Dear Expert, Thank you so much for your valuable time and your kind help. Your advice and instructions helped a me a lot,
Kind regards, Le Phuong.
Dear Authors and Experts ARMOR,
I am a microbiologist, I am learning RNAseq analysis. However, when I run the file, I had the problem when the ARMOR was running: Activating conda environment: /home/leo/ARMOR/.snakemake/conda/10da7374 Jul 01 15:58:02 ..... started STAR run Jul 01 15:58:02 ..... loading genome Jul 01 15:58:03 ..... started mapping /bin/bash: line 1: 37248 Segmentation fault (core dumped) STAR --genomeDir /home/leo/ARMOR/E109_E453/E109_E453_starindex.idx --readFilesIn /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FASTQtrimmed/E453_4_2_R1_val_1.fq.gz /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FASTQtrimmed/E453_4_2_R2_val_2.fq.gz --runThreadN 18 --outFileNamePrefix /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/STAR/E453_4_2/E453_42 --outSAMtype BAM SortedByCoordinate --readFilesCommand gunzip -c [Wed Jul 1 15:58:05 2020] Error in rule starPE: jobid: 57 output: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/STAR/E453_4_2/E453_4_2_Aligned.sortedByCoord.out.bam log: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/logs/STAR_E453_4_2.log (check log file(s) for error message) conda-env: /home/leo/ARMOR/.snakemake/conda/10da7374 shell: echo 'STAR version: ' > /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/logs/STAR_E453_4_2.log; STAR --version >> /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/logs/STAR_E453_4_2.log; STAR --genomeDir /home/leo/ARMOR/E109_E453/E109_E453_starindex.idx --readFilesIn /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FASTQtrimmed/E453_4_2_R1_val_1.fq.gz /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/FASTQtrimmed/E453_4_2_R2_val_2.fq.gz --runThreadN 18 --outFileNamePrefix /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/STAR/E453_4_2/E453_42 --outSAMtype BAM SortedByCoordinate --readFilesCommand gunzip -c (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Removing output files of failed job starPE since they might be corrupted: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/STAR/E453_4_2/E453_4_2_Aligned.sortedByCoord.out.bam [Wed Jul 1 16:04:31 2020] Error in rule tximeta: jobid: 62 output: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/tximeta_se.rds log: /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/tximeta_se.Rout (check log file(s) for error message) conda-env: /home/leo/ARMOR/.snakemake/conda/9f6359b5 shell: R CMD BATCH --no-restore --no-save "--args salmondir='/home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/salmon' json='/home/leo/ARMOR/E109_E453/E109_E453_saslmonindex.sidx.json' metafile='/home/leo/ARMOR/E109_E453/metadata_E109_E453.txt' outrds='/home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/outputR/tximeta_se.rds' annotation='Ensembl' organism='Ecoli'" scripts/run_tximeta.R /home/leo/ECOPAIR_RNAseq_rawdata/E109_453_RNAseq_output/Rout/tximeta_se.Rout (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Error! The Snakemake workflow aborted. Complete log: /home/leo/ARMOR/.snakemake/log/2020-07-01T152546.626254.snakemake.log
I am not a bioinformatic expert. I do not know how to fix the problem, When you have time, could you please let me know how to solve this problem?
Thank you in advance, Kind regards, Le Phuong.