egaffo / circompara2

Improved bioinformatic pipeline to identify and quantify circRNA expression from RNA-seq data by combining multiple circRNA detection methods
Other
8 stars 0 forks source link

scons: *** [samples/sample_A/processings/circRNAs/dcc/CircRNACount] Error 1 #4

Open Kingatsu opened 2 years ago

Kingatsu commented 2 years ago

When I run the test, whether cd test_circompara/analysis ../../circompara2 or cd test_circompara/analysis_se ../../circompara2 , it reported this error. I install circompara2 in conda python=2.7 envs as #3 Alipe2021 did. I am new to this, so I don't know if it's installed properly yet or it is a normal result. Anyone can give me a hand, thanks!


samtools view -F 4 samples/sample_A/processings/circRNAs/star_out/Aligned.sortedByCoord.out.bam | cut -f 1 | sort | uniq |
wc -l > samples/sample_A/processings/circRNAs/star_out/STAR_mapped_reads_count.txt
DCC -fg -M -F -Nr 1 1 -N -T 4 -D -O samples/sample_A/processings/circRNAs/dcc -t samples/sample_A/processings/circRNAs/dcc/_tmp_DCC samples/sample_A/processings/circRNAs/star_out/Chimeric.out.junction
Output folder samples/sample_A/processings/circRNAs/dcc already exists, reusing
DCC 0.4.8 started
6 CPU cores available, using 4
WARNING: File samples/sample_A/processings/circRNAs/star_out/Chimeric.out.junction is empty!
Junction files seem empty, skipping circRNA detection module.
circRNA detection skipped due to empty junction files
Filter mode for detected circRNAs enabled without detection module.
Combine with -f or -D.
scons: *** [samples/sample_A/processings/circRNAs/dcc/CircRNACount] Error 1
scons: building terminated because of errors.
egaffo commented 2 years ago

The "WARNING: File samples/sample_A/processings/circRNAs/star_out/Chimeric.out.junction is empty!" suggests something went wrong with STAR alignment. Please, check the STAR version you run is the same required by circompara2.

Kingatsu commented 2 years ago

The "WARNING: File samples/sample_A/processings/circRNAs/star_out/Chimeric.out.junction is empty!" suggests something went wrong with STAR alignment. Please, check the STAR version you run is the same required by circompara2.

Thanks for your help, I find a 'STAR-2.6.1e' folder in circompara/tools, but I code STAR -h in my conda env is 'STAR version=2.7.9a', so should I downgrade the STAR to 2.6.1e in my conda env?

Kingatsu commented 2 years ago

The "WARNING: File samples/sample_A/processings/circRNAs/star_out/Chimeric.out.junction is empty!" suggests something went wrong with STAR alignment. Please, check the STAR version you run is the same required by circompara2.

I've checked circompara2/bin/STAR --version is 2.6.1e, and it is the same version in the 'install_tools.py'. And it report the same error in another python2.7 conda env which without conda install STAR when I running the test.

Kingatsu commented 2 years ago

Hello @egaffo ,I solved the STAR problem when I ran it in / , but now it has a new problem in the test running. Here is the error:

gene_annotation.R -c circular_expression/circrna_collection/combined_circrnas.gtf.gz -o circular_expression/circrna_collection/circrna_gene_annotation
Error in get(genname, envir = envir) : object 'testthat_print' not found
stringtie -p 24 -o samples/sample_A/processings/stringtie/sample_A_transcripts.gtf -A samples/sample_A/processings/stringtie/sample_A_gene_abund.tab -l sample_A -G /BIGDATA2/sysu_cyq_1/zhushunxin/CircRNA/circompara2/test_circompara/annotation/CFLAR_HIPK3.gtf -C samples/sample_A/processings/stringtie/sample_A_cov_refs.gtf -b samples/sample_A/processings/stringtie/ballgown_ctabs -e samples/sample_A/processings/hisat2_out/sample_A_hisat2.bam
stringtie -p 24 -o samples/sample_B/processings/stringtie/sample_B_transcripts.gtf -A samples/sample_B/processings/stringtie/sample_B_gene_abund.tab -l sample_B -G /BIGDATA2/sysu_cyq_1/zhushunxin/CircRNA/circompara2/test_circompara/annotation/CFLAR_HIPK3.gtf -C samples/sample_B/processings/stringtie/sample_B_cov_refs.gtf -b samples/sample_B/processings/stringtie/ballgown_ctabs -e samples/sample_B/processings/hisat2_out/sample_B_hisat2.bam
writeLines(["linear_expression/linear_quantexp_stringtie/geneexp/samples_expression_files.txt"], ["samples/sample_A/processings/stringtie/sample_A_gene_abund.tab", "samples/sample_B/processings/stringtie/sample_B_gene_abund.tab"])
writeLines(["linear_expression/linear_quantexp_stringtie/geneexp/samples_trxexp_files.txt"], ["samples/sample_A/processings/stringtie/sample_A_transcripts.gtf", "samples/sample_B/processings/stringtie/sample_B_transcripts.gtf"])
echo "No reads in /BIGDATA2/sysu_cyq_1/zhushunxin/CircRNA/circompara2/test_circompara/reads/readsB_1.fastq.gz" > samples/sample_B/read_statistics/fastqc_stats/readsB_1_fastqc.html && echo "No reads in /BIGDATA2/sysu_cyq_1/zhushunxin/CircRNA/circompara2/test_circompara/reads/readsB_1.fastq.gz" > samples/sample_B/read_statistics/fastqc_stats/readsB_1_fastqc/fastqc_data.txt && fastqc /BIGDATA2/sysu_cyq_1/zhushunxin/CircRNA/circompara2/test_circompara/reads/readsB_1.fastq.gz -o samples/sample_B/read_statistics/fastqc_stats --extract > samples/sample_B/read_statistics/fastqc_stats/readsB_1.fastq_fastqc.log 2> samples/sample_B/read_statistics/fastqc_stats/readsB_1.fastq_fastqc.err
echo "No reads in /BIGDATA2/sysu_cyq_1/zhushunxin/CircRNA/circompara2/test_circompara/reads/readsB_2.fastq.gz" > samples/sample_B/read_statistics/fastqc_stats/readsB_2_fastqc.html && echo "No reads in /BIGDATA2/sysu_cyq_1/zhushunxin/CircRNA/circompara2/test_circompara/reads/readsB_2.fastq.gz" > samples/sample_B/read_statistics/fastqc_stats/readsB_2_fastqc/fastqc_data.txt && fastqc /BIGDATA2/sysu_cyq_1/zhushunxin/CircRNA/circompara2/test_circompara/reads/readsB_2.fastq.gz -o samples/sample_B/read_statistics/fastqc_stats --extract > samples/sample_B/read_statistics/fastqc_stats/readsB_2.fastq_fastqc.log 2> samples/sample_B/read_statistics/fastqc_stats/readsB_2.fastq_fastqc.err
get_stringtie_rawcounts.R -g samples/sample_B/processings/stringtie/sample_B_transcripts.gtf -f /BIGDATA2/sysu_cyq_1/zhushunxin/CircRNA/circompara2/test_circompara/analysis/samples/sample_B/read_statistics/fastqc_stats/readsB_1_fastqc/fastqc_data.txt,/BIGDATA2/sysu_cyq_1/zhushunxin/CircRNA/circompara2/test_circompara/analysis/samples/sample_B/read_statistics/fastqc_stats/readsB_2_fastqc/fastqc_data.txt -o samples/sample_B/processings/stringtie/sample_B_
Error in strsplit(grep("Sequence length", x = fastqc_data.txt, value = T),  : 
  subscript out of bounds
Calls: mean -> sapply -> lapply -> FUN -> mean -> strsplit
Execution halted
scons: *** [samples/sample_B/processings/stringtie/sample_B_gene_expression_rawcounts.csv] Error 1
scons: building terminated because of errors.

The R version is 3.6.3, I am looking forward to your reply.

egaffo commented 2 years ago

The error Error in get(genname, envir = envir) : object 'testthat_print' not found could be the culprit. Please, check this thread https://github.com/r-lib/rlang/issues/1112

hafizmtalha commented 2 years ago

Hello @egaffo ,I solved the STAR problem when I ran it in / , but now it has a new problem in the test running. Here is the error:

gene_annotation.R -c circular_expression/circrna_collection/combined_circrnas.gtf.gz -o circular_expression/circrna_collection/circrna_gene_annotation
Error in get(genname, envir = envir) : object 'testthat_print' not found
stringtie -p 24 -o samples/sample_A/processings/stringtie/sample_A_transcripts.gtf -A samples/sample_A/processings/stringtie/sample_A_gene_abund.tab -l sample_A -G /BIGDATA2/sysu_cyq_1/zhushunxin/CircRNA/circompara2/test_circompara/annotation/CFLAR_HIPK3.gtf -C samples/sample_A/processings/stringtie/sample_A_cov_refs.gtf -b samples/sample_A/processings/stringtie/ballgown_ctabs -e samples/sample_A/processings/hisat2_out/sample_A_hisat2.bam
stringtie -p 24 -o samples/sample_B/processings/stringtie/sample_B_transcripts.gtf -A samples/sample_B/processings/stringtie/sample_B_gene_abund.tab -l sample_B -G /BIGDATA2/sysu_cyq_1/zhushunxin/CircRNA/circompara2/test_circompara/annotation/CFLAR_HIPK3.gtf -C samples/sample_B/processings/stringtie/sample_B_cov_refs.gtf -b samples/sample_B/processings/stringtie/ballgown_ctabs -e samples/sample_B/processings/hisat2_out/sample_B_hisat2.bam
writeLines(["linear_expression/linear_quantexp_stringtie/geneexp/samples_expression_files.txt"], ["samples/sample_A/processings/stringtie/sample_A_gene_abund.tab", "samples/sample_B/processings/stringtie/sample_B_gene_abund.tab"])
writeLines(["linear_expression/linear_quantexp_stringtie/geneexp/samples_trxexp_files.txt"], ["samples/sample_A/processings/stringtie/sample_A_transcripts.gtf", "samples/sample_B/processings/stringtie/sample_B_transcripts.gtf"])
echo "No reads in /BIGDATA2/sysu_cyq_1/zhushunxin/CircRNA/circompara2/test_circompara/reads/readsB_1.fastq.gz" > samples/sample_B/read_statistics/fastqc_stats/readsB_1_fastqc.html && echo "No reads in /BIGDATA2/sysu_cyq_1/zhushunxin/CircRNA/circompara2/test_circompara/reads/readsB_1.fastq.gz" > samples/sample_B/read_statistics/fastqc_stats/readsB_1_fastqc/fastqc_data.txt && fastqc /BIGDATA2/sysu_cyq_1/zhushunxin/CircRNA/circompara2/test_circompara/reads/readsB_1.fastq.gz -o samples/sample_B/read_statistics/fastqc_stats --extract > samples/sample_B/read_statistics/fastqc_stats/readsB_1.fastq_fastqc.log 2> samples/sample_B/read_statistics/fastqc_stats/readsB_1.fastq_fastqc.err
echo "No reads in /BIGDATA2/sysu_cyq_1/zhushunxin/CircRNA/circompara2/test_circompara/reads/readsB_2.fastq.gz" > samples/sample_B/read_statistics/fastqc_stats/readsB_2_fastqc.html && echo "No reads in /BIGDATA2/sysu_cyq_1/zhushunxin/CircRNA/circompara2/test_circompara/reads/readsB_2.fastq.gz" > samples/sample_B/read_statistics/fastqc_stats/readsB_2_fastqc/fastqc_data.txt && fastqc /BIGDATA2/sysu_cyq_1/zhushunxin/CircRNA/circompara2/test_circompara/reads/readsB_2.fastq.gz -o samples/sample_B/read_statistics/fastqc_stats --extract > samples/sample_B/read_statistics/fastqc_stats/readsB_2.fastq_fastqc.log 2> samples/sample_B/read_statistics/fastqc_stats/readsB_2.fastq_fastqc.err
get_stringtie_rawcounts.R -g samples/sample_B/processings/stringtie/sample_B_transcripts.gtf -f /BIGDATA2/sysu_cyq_1/zhushunxin/CircRNA/circompara2/test_circompara/analysis/samples/sample_B/read_statistics/fastqc_stats/readsB_1_fastqc/fastqc_data.txt,/BIGDATA2/sysu_cyq_1/zhushunxin/CircRNA/circompara2/test_circompara/analysis/samples/sample_B/read_statistics/fastqc_stats/readsB_2_fastqc/fastqc_data.txt -o samples/sample_B/processings/stringtie/sample_B_
Error in strsplit(grep("Sequence length", x = fastqc_data.txt, value = T),  : 
  subscript out of bounds
Calls: mean -> sapply -> lapply -> FUN -> mean -> strsplit
Execution halted
scons: *** [samples/sample_B/processings/stringtie/sample_B_gene_expression_rawcounts.csv] Error 1
scons: building terminated because of errors.

The R version is 3.6.3, I am looking forward to your reply.

@Kingatsu how did you solve the STAR issue ?? I am facing the same problem in test run. WARNING: File samples/sample_A/processings/circRNAs/star_out/Chimeric.out.junction is empty! Junction files seem empty, skipping circRNA detection module. circRNA detection skipped due to empty junction files Filter mode for detected circRNAs enabled without detection module. Combine with -f or -D. scons: *** [samples/sample_A/processings/circRNAs/dcc/CircRNACount] Error 1 scons: building terminated because of errors.

Kingatsu commented 2 years ago

Hello @egaffo ,I solved the STAR problem when I ran it in / , but now it has a new problem in the test running. Here is the error:

gene_annotation.R -c circular_expression/circrna_collection/combined_circrnas.gtf.gz -o circular_expression/circrna_collection/circrna_gene_annotation
Error in get(genname, envir = envir) : object 'testthat_print' not found
stringtie -p 24 -o samples/sample_A/processings/stringtie/sample_A_transcripts.gtf -A samples/sample_A/processings/stringtie/sample_A_gene_abund.tab -l sample_A -G /BIGDATA2/sysu_cyq_1/zhushunxin/CircRNA/circompara2/test_circompara/annotation/CFLAR_HIPK3.gtf -C samples/sample_A/processings/stringtie/sample_A_cov_refs.gtf -b samples/sample_A/processings/stringtie/ballgown_ctabs -e samples/sample_A/processings/hisat2_out/sample_A_hisat2.bam
stringtie -p 24 -o samples/sample_B/processings/stringtie/sample_B_transcripts.gtf -A samples/sample_B/processings/stringtie/sample_B_gene_abund.tab -l sample_B -G /BIGDATA2/sysu_cyq_1/zhushunxin/CircRNA/circompara2/test_circompara/annotation/CFLAR_HIPK3.gtf -C samples/sample_B/processings/stringtie/sample_B_cov_refs.gtf -b samples/sample_B/processings/stringtie/ballgown_ctabs -e samples/sample_B/processings/hisat2_out/sample_B_hisat2.bam
writeLines(["linear_expression/linear_quantexp_stringtie/geneexp/samples_expression_files.txt"], ["samples/sample_A/processings/stringtie/sample_A_gene_abund.tab", "samples/sample_B/processings/stringtie/sample_B_gene_abund.tab"])
writeLines(["linear_expression/linear_quantexp_stringtie/geneexp/samples_trxexp_files.txt"], ["samples/sample_A/processings/stringtie/sample_A_transcripts.gtf", "samples/sample_B/processings/stringtie/sample_B_transcripts.gtf"])
echo "No reads in /BIGDATA2/sysu_cyq_1/zhushunxin/CircRNA/circompara2/test_circompara/reads/readsB_1.fastq.gz" > samples/sample_B/read_statistics/fastqc_stats/readsB_1_fastqc.html && echo "No reads in /BIGDATA2/sysu_cyq_1/zhushunxin/CircRNA/circompara2/test_circompara/reads/readsB_1.fastq.gz" > samples/sample_B/read_statistics/fastqc_stats/readsB_1_fastqc/fastqc_data.txt && fastqc /BIGDATA2/sysu_cyq_1/zhushunxin/CircRNA/circompara2/test_circompara/reads/readsB_1.fastq.gz -o samples/sample_B/read_statistics/fastqc_stats --extract > samples/sample_B/read_statistics/fastqc_stats/readsB_1.fastq_fastqc.log 2> samples/sample_B/read_statistics/fastqc_stats/readsB_1.fastq_fastqc.err
echo "No reads in /BIGDATA2/sysu_cyq_1/zhushunxin/CircRNA/circompara2/test_circompara/reads/readsB_2.fastq.gz" > samples/sample_B/read_statistics/fastqc_stats/readsB_2_fastqc.html && echo "No reads in /BIGDATA2/sysu_cyq_1/zhushunxin/CircRNA/circompara2/test_circompara/reads/readsB_2.fastq.gz" > samples/sample_B/read_statistics/fastqc_stats/readsB_2_fastqc/fastqc_data.txt && fastqc /BIGDATA2/sysu_cyq_1/zhushunxin/CircRNA/circompara2/test_circompara/reads/readsB_2.fastq.gz -o samples/sample_B/read_statistics/fastqc_stats --extract > samples/sample_B/read_statistics/fastqc_stats/readsB_2.fastq_fastqc.log 2> samples/sample_B/read_statistics/fastqc_stats/readsB_2.fastq_fastqc.err
get_stringtie_rawcounts.R -g samples/sample_B/processings/stringtie/sample_B_transcripts.gtf -f /BIGDATA2/sysu_cyq_1/zhushunxin/CircRNA/circompara2/test_circompara/analysis/samples/sample_B/read_statistics/fastqc_stats/readsB_1_fastqc/fastqc_data.txt,/BIGDATA2/sysu_cyq_1/zhushunxin/CircRNA/circompara2/test_circompara/analysis/samples/sample_B/read_statistics/fastqc_stats/readsB_2_fastqc/fastqc_data.txt -o samples/sample_B/processings/stringtie/sample_B_
Error in strsplit(grep("Sequence length", x = fastqc_data.txt, value = T),  : 
  subscript out of bounds
Calls: mean -> sapply -> lapply -> FUN -> mean -> strsplit
Execution halted
scons: *** [samples/sample_B/processings/stringtie/sample_B_gene_expression_rawcounts.csv] Error 1
scons: building terminated because of errors.

The R version is 3.6.3, I am looking forward to your reply.

@Kingatsu how did you solve the STAR issue ?? I am facing the same problem in test run. WARNING: File samples/sample_A/processings/circRNAs/star_out/Chimeric.out.junction is empty! Junction files seem empty, skipping circRNA detection module. circRNA detection skipped due to empty junction files Filter mode for detected circRNAs enabled without detection module. Combine with -f or -D. scons: *** [samples/sample_A/processings/circRNAs/dcc/CircRNACount] Error 1 scons: building terminated because of errors.

Well, circompara2 ran smoothly when I installed it in the ROOT directory or in a real Linux server. For reference, I install Ubuntu as a subsystem in my PC and circompara2 can't work well in the other mounted disks. So I guess maybe the differences of file system between mounted disk and ROOT directory cause the issue. I hope my reply can help you.

egaffo commented 2 years ago

There are some issues when running STAR from a container or in a shared filesystem (e.g. NFS) because of temporary files. Setting the --outTmpDir STAR parameter with a custom directory solved the problem of empty Chimeric.out.junction. To do that with the CIrComPara2 container, you'll have to do the following:

  1. generate the custom temporary dir (possibly not in the shared FS)
  2. mount the new tmp dir into the container
  3. set the STAR_PARAM properly in the vars.py file

Points 1) and 2) can be done from the command line. The command line to launch CirComPara2 will be like this:

#!/bin/bash
## create a new temp dir
MYTMPDIR=$(mktemp -d)
## mind the new tmp dir is mounted as a volume into the container /ttmmpp dir
docker run -u `id -u`:`id -g` --rm -it -v $MYTMPDIR:/ttmmpp -v $(pwd):/data egaffo/circompara2:v0.1.2.1 
## delete the tmp dir once finished
trap "rm -rf $MYTMPDIR" EXIT

And the STAR_PARAMS will be set in the vars.py as follows:

STAR_PARAMS = '--outTmpDir /ttmmpp/$SAMPLE '\
              '--runRNGseed 123 '\
              '--outSJfilterOverhangMin 15 15 15 15 '\
              '--alignSJoverhangMin 15 '\
              '--alignSJDBoverhangMin 15 '\
              '--seedSearchStartLmax 30 '\
              '--outFilterScoreMin 1 '\
              '--outFilterMatchNmin 1 '\
              '--outFilterMismatchNmax 2 '\
              '--chimSegmentMin 15 '\
              '--chimScoreMin 15 '\
              '--chimScoreSeparation 10 '\
              '--chimJunctionOverhangMin 15'

Mind that the STAR_PARAM has to specify also the other parameters that are default in CirComPara2 because the STAR_PARAM will overwrite the default values. N.B: the STAR_PARAMS is a one-line Python string; here, I've just split it into multiline to improve readability.