egaffo / circompara2

Improved bioinformatic pipeline to identify and quantify circRNA expression from RNA-seq data by combining multiple circRNA detection methods
Other
8 stars 0 forks source link

Calls: mean -> sapply -> lapply -> FUN -> mean -> strsplit Execution halted #10

Open hafizmtalha opened 1 year ago

hafizmtalha commented 1 year ago

echo "No reads in /dell/muscle/sample9_2.fastq.gz" > samples/sample9/read_statistics/fastqc_stats/sample9_2_fastqc.html && echo "No reads in /dell/muscle/sample9_2.fastq.gz" > samples/sample9/read_statistics/fastqc_stats/sample9_2_fastqc/fastqc_data.txt && fastqc /dell/muscle/sample9_2.fastq.gz -o samples/sample9/read_statistics/fastqc_stats --extract > samples/sample9/read_statistics/fastqc_stats/sample9_2.fastq_fastqc.log 2> samples/sample9/read_statistics/fastqc_stats/sample9_2.fastq_fastqc.err get_stringtie_rawcounts.R -g samples/sample9/processings/stringtie/sample9_transcripts.gtf -f /dell/circompara2/test_circompara/analysis/samples/sample9/read_statistics/fastqc_stats/sample9_1_fastqc/fastqc_data.txt,/dell/circompara2/test_circompara/analysis/samples/sample9/read_statistics/fastqc_stats/sample9_2_fastqc/fastqcdata.txt -o samples/sample9/processings/stringtie/sample9 Error in strsplit(grep("Sequence length", x = fastqc_data.txt, value = T), : subscript out of bounds Calls: mean -> sapply -> lapply -> FUN -> mean -> strsplit Execution halted scons: *** [samples/sample9/processings/stringtie/sample9_gene_expression_rawcounts.csv] Error 1 scons: building terminated because of errors.

How to solve this ?

egaffo commented 1 year ago

Can you post your meta.csv and vars.py files? They help to understand where the error stands. Also, you can check the content of the fastqc_data.txt file if some error log was written.

Enrico

Il Mer 10 Ago 2022, 20:50 Hafiz Muhammad Talha @.***> ha scritto:

echo "No reads in /dell/muscle/sample9_2.fastq.gz" > samples/sample9/read_statistics/fastqc_stats/sample9_2_fastqc.html && echo "No reads in /dell/muscle/sample9_2.fastq.gz" > samples/sample9/read_statistics/fastqc_stats/sample9_2_fastqc/fastqc_data.txt && fastqc /dell/muscle/sample9_2.fastq.gz -o samples/sample9/read_statistics/fastqc_stats --extract > samples/sample9/read_statistics/fastqc_stats/sample9_2.fastq_fastqc.log 2> samples/sample9/read_statistics/fastqc_stats/sample9_2.fastq_fastqc.err get_stringtie_rawcounts.R -g samples/sample9/processings/stringtie/sample9_transcripts.gtf -f /dell/circompara2/test_circompara/analysis/samples/sample9/read_statistics/fastqc_stats/sample9_1_fastqc/fastqc_data.txt,/dell/circompara2/test_circompara/analysis/samples/sample9/read_statistics/fastqc_stats/sample9_2_fastqc/fastqcdata.txt -o samples/sample9/processings/stringtie/sample9 Error in strsplit(grep("Sequence length", x = fastqc_data.txt, value = T), : subscript out of bounds Calls: mean -> sapply -> lapply -> FUN -> mean -> strsplit Execution halted scons: *** [samples/sample9/processings/stringtie/sample9_gene_expression_rawcounts.csv] Error 1 scons: building terminated because of errors.

How to solve this ?

— Reply to this email directly, view it on GitHub https://github.com/egaffo/circompara2/issues/10, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADGPTU6POIM44PWYBM4KSEDVYP2V7ANCNFSM56FSBHMQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

hafizmtalha commented 1 year ago

the file fastqc_data.txt No reads in /dell/muscle/sample9_2.fastq.gz

vars.py

META = 'meta.csv' GENOME_FASTA = '/dell/new3TB/mousegenome/Mus_musculus.GRCm39.dna.primary_assembly.fa' ANNOTATION = '/dell/new3TB/mousegenome/Mus_musculus.GRCm39.107.gtf' CPUS = '20'

pre-computed index and annotation files

GENOME_INDEX = "../indexes/hisat2/CFLAR_HIPK3"

SEGEMEHL_INDEX = "/dell/circompara2/test_circompara/analysis/dbs/indexes/indexes/segemehl/Mus_musculus.GRCm39.dna.primary_assembly.idx" BWA_INDEX = "/dell/circompara2/test_circompara/analysis/dbs/indexes/indexes/bwa/Mus_musculus.GRCm39.dna.primary_assembly" BOWTIE2_INDEX = "/dell/circompara2/test_circompara/analysis/dbs/indexes/indexes/bowtie2/Mus_musculus.GRCm39.dna.primary_assembly"

BOWTIE_INDEX = "../indexes/bowtie/CFLAR_HIPK3"

STAR_INDEX = "/dell/circompara2/test_circompara/analysis/dbs/indexes/indexes/star/Mus_musculus.GRCm39.dna.primary_assembly/" GENEPRED = "/dell/circompara2/test_circompara/analysis/dbs/indexes/Mus_musculus.GRCm39.107.genePred.wgn"

REST OF THE THINGS WERE COMMENTED AS IN DEFAULT FILE

egaffo commented 1 year ago

How about the meta.csv? Did you check the fastq file is not empty? Also, check if some error were reported in the read_statistics/fastqc_stats/*_fastqc.log andread_statistics/fastqc_stats/*_fastqc.err files. Which version of circompara2 are you using, and is it a custom installation or the Docker container? Did it work with the test data?

hafizmtalha commented 1 year ago

meta.csv

file,sample,condition /dell/muscle/sample6_1.fastq.gz,S6,WT /dell/muscle/sample6_2.fastq.gz,S6,WT /dell/muscle/sample7_1.fastq.gz,S7,WT /dell/muscle/sample7_2.fastq.gz,S7,WT /dell/muscle/sample8_1.fastq.gz,S8,WT /dell/muscle/sample8_2.fastq.gz,S8,WT /dell/muscle/sample9_1.fastq.gz,S9,WT /dell/muscle/sample9_2.fastq.gz,S9,WT /dell/muscle/sample10_1.fastq.gz,S10,WT /dell/muscle/sample10_2.fastq.gz,S10,WT

File is not empty and mappers worked fine on this..!!

read_statistics/fastqc_stats/*_fastqc.log is empty, nothing in it read_statistics/fastqc_stats/*_fastqc.err has an error uk.ac.babraham.FastQC.Sequence.SequenceFormatException: Midline '@GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGF' didn't start with '+' at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:172) at uk.ac.babraham.FastQC.Sequence.FastQFile.next(FastQFile.java:125) at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:77) samples/SRR9302709/read_statistics/fastqc_stats/SRR9302709_1.fastq_fastqc.err

I cloned this repository and then installed it.. Test run was succesful.

egaffo commented 1 year ago

The error tells your fastq file is not well formatted, therefore FASTQC fails. Check the consistency of your input files, they must be properly formatted as FASTQ

hafizmtalha commented 1 year ago

that's quite strange because I checked head n tail of both files for this sample and they look fine.. Is there any way I could skip preprocessing steps of all other samples and run only this sample until this same point ??

egaffo commented 1 year ago

You can remove that sample from the meta.csv and run circompara2 on the same project directory. Circompara2 will just skip that sample without reprocessing tasks already done in your previous run. I suggest you make a new project dir with only the "corrupt" file and make your tests. Then, when everything will be ok either you merge the two project results "by hand", or add again the fixed input file in the meta.csv and run circompara2 again to let it update the final result files.

hafizmtalha commented 1 year ago

Thanks for the help.. will try that

wer7894562 commented 7 months ago

echo "No reads in /dell/muscle/sample9_2.fastq.gz" > samples/sample9/read_statistics/fastqc_stats/sample9_2_fastqc.html && echo "No reads in /dell/muscle/sample9_2.fastq.gz" > samples/sample9/read_statistics/fastqc_stats/sample9_2_fastqc/fastqc_data.txt && fastqc /dell/muscle/sample9_2.fastq.gz -o samples/sample9/read_statistics/fastqc_stats --extract > samples/sample9/read_statistics/fastqc_stats/sample9_2.fastq_fastqc.log 2> samples/sample9/read_statistics/fastqc_stats/sample9_2.fastq_fastqc.err get_stringtie_rawcounts.R -g samples/sample9/processings/stringtie/sample9_transcripts.gtf -f /dell/circompara2/test_circompara/analysis/samples/sample9/read_statistics/fastqc_stats/sample9_1_fastqc/fastqc_data.txt,/dell/circompara2/test_circompara/analysis/samples/sample9/read_statistics/fastqc_stats/sample9_2_fastqc/fastqcdata.txt -o samples/sample9/processings/stringtie/sample9 Error in strsplit(grep("Sequence length", x = fastqc_data.txt, value = T), : subscript out of bounds Calls: mean -> sapply -> lapply -> FUN -> mean -> strsplit Execution halted scons: *** [samples/sample9/processings/stringtie/sample9_gene_expression_rawcounts.csv] Error 1 scons: building terminated because of errors.

How to solve this ?

Hi, Have you sloved this problem?