Open desmodus1984 opened 3 months ago
The files for --b1
and --b2
should each be a single line with ,
separated file paths: https://github.com/Xinglab/rmats-turbo/tree/v4.3.0?tab=readme-ov-file#starting-with-bam-files
It looks like your files have 1 file per line. Here's a similar issue: https://github.com/Xinglab/rmats-turbo/issues/191
Hi, I did fix the txt files, and I ran rmats again, and now I have this weird error:
rmats.py --b1 ctrl.H.txt --b2 ctrl.Low.txt --gtf ../GCF_011952255.1_Bvos_JDL3184-5_v1.1_genomic.gtf -t paired --readLength 50 --nthread 12 --od rmats-test --tmp temp gtf: 2.459548234939575 There are 12973 distinct gene ID in the gtf file There are 27041 distinct transcript ID in the gtf file There are 7807 one-transcript genes in the gtf file There are 236878 exons in the gtf file There are 859 one-exon transcripts in the gtf file There are 850 one-transcript genes with only one exon in the transcript Average number of transcripts per gene is 2.084406 Average number of exons per transcript is 8.759957 Average number of exons per transcript excluding one-exon tx is 9.014552 Average number of gene per geneGroup is 17.933534 statistic: 0.0020248889923095703
read outcome totals across all BAMs USED: 179222103 NOT_PAIRED: 64236921 NOT_NH_1: 111524948 NOT_EXPECTED_CIGAR: 1536457 NOT_EXPECTED_READ_LENGTH: 137936793 NOT_EXPECTED_STRAND: 0 EXON_NOT_MATCHED_TO_ANNOTATION: 13366380 JUNCTION_NOT_MATCHED_TO_ANNOTATION: 603563 CLIPPED: 29666879 total: 538094044 outcomes by BAM written to: temp/2024-08-12-19_17_37_342811_read_outcomes_by_bam.txt
novel: 151.39687323570251
The splicing graph and candidate read have been saved into temp/2024-08-12-19_17_37342811*.rmats
save: 2.6668777465820312
Traceback (most recent call last):
File "/home/juaguila/miniconda3/envs/rmats/bin/rmats.py", line 979, in
Could you suggest what can I do to check the file or perhaps fix it? Those files are the output of STAR, I'd prefer not to reprocess the files, unless you absolute thing that using the fastq is better.
Thanks;
It looks like you're using the same --tmp temp
as the previous run. That directory is used for some intermediate files and the error is happening when it reads one of those files from the run with the newline separated bams. You could either remove the files in temp
or use a new directory for --tmp
Hi,
I would like to analyze the RNA-seq that I have, but the design was kinda awkward; We have low and high-elevation individuals, that were then tested for temperature response -cold, heat, and normal temp, but each treatment had individuals from both elevations. Can I reuse the bam files created with rmats to then reanalyze them?
Any suggestion you could say about my code:
rmats.py --b1 ctrl.H.txt --b2 ctrl.Low.txt --gtf ../GCF_011952255.1_Bvos_JDL3184-5_v1.1_genomic.gtf -t paired --readLength 50 --nthread 12 --od rmats-test --tmp temp
Lastly, can I use the \ symbol in the text files? The file names are very large, and there is like 30 samples for each group. So it is a huge line to read, and I was hoping to use the "\" to edit in case any name was misstyped.
Thanks;
Can I reuse the bam files created with rmats to then reanalyze them?
If you ran with fastq files then rMATS would put the bam files from STAR in the --tmp
folder numbered based on the order in your --s1
and --s2
files: https://github.com/Xinglab/rmats-turbo/blob/v4.3.0/rmats.py#L57
can I use the \ symbol in the text files?
rMATS doesn't handle \
for splitting a line. The code expects the --b1
and --b2
files to be a single line with ,
separated file paths
Hi,
I am interested in assessing differential splicing between two populations. I got PE-50 stranded mRNA libraries, which were mapped with STAR, and I thought on using the output bam files to run rmats. As stated I created the txt files with the bam files;
ctrl.Low.txt
123.V00796.L1Aligned.sortedByCoord.out.bam 45.V00001Aligned.sortedByCoord.out.bam 5.V00304.L1Aligned.sortedByCoord.out.bam 66.V01425Aligned.sortedByCoord.out.bam 68.V0356.R1Aligned.sortedByCoord.out.bam 86.V00750.L1Aligned.sortedByCoord.out.bam
my txt files are ctrl.H.txt and ctrl.Low.txt, and when I ran rmats:
rmats.py --b1 ctrl.H.txt --b2 ctrl.Low.txt --gtf ../GCF_011952255.1_Bvos_JDL3184-5_v1.1_genomic.gtf -t paired --readLength 50 --nthread 12 --od rmats-test --tmp temp
I got an error:
gtf: 3.5072505474090576 There are 12973 distinct gene ID in the gtf file There are 27041 distinct transcript ID in the gtf file There are 7807 one-transcript genes in the gtf file There are 236878 exons in the gtf file There are 859 one-exon transcripts in the gtf file There are 850 one-transcript genes with only one exon in the transcript Average number of transcripts per gene is 2.084406 Average number of exons per transcript is 8.759957 Average number of exons per transcript excluding one-exon tx is 9.014552 Average number of gene per geneGroup is 17.933534 statistic: 0.0022149085998535156 Fail to open 101.V01347Aligned.sortedByCoord.out.bam 120.V02191.L1Aligned.sortedByCoord.out.bam 20.V02055.L1Aligned.sortedByCoord.out.bam 44.V01947.L1Aligned.sortedByCoord.out.bam 61.V01712.L1Aligned.sortedByCoord.out.bam 89.V01926.R1Aligned.sortedByCoord.out.bam: BamReader::Open: could not open file: 101.V01347Aligned.sortedByCoord.out.bam 120.V02191.L1Aligned.sortedByCoord.out.bam 20.V02055.L1Aligned.sortedByCoord.out.bam 44.V01947.L1Aligned.sortedByCoord.out.bam 61.V01712.L1Aligned.sortedByCoord.out.bam 89.V01926.R1Aligned.sortedByCoord.out.bam BgzfStream::Open: could not open BGZF stream: BamFile::Open: could not open file handle for 101.V01347Aligned.sortedByCoord.out.bam 120.V02191.L1Aligned.sortedByCoord.out.bam 20.V02055.L1Aligned.sortedByCoord.out.bam 44.V01947.L1Aligned.sortedByCoord.out.bam 61.V01712.L1Aligned.sortedByCoord.out.bam 89.V01926.R1Aligned.sortedByCoord.out.bam Fail to open 123.V00796.L1Aligned.sortedByCoord.out.bam 45.V00001Aligned.sortedByCoord.out.bam 5.V00304.L1Aligned.sortedByCoord.out.bam 66.V01425Aligned.sortedByCoord.out.bam 68.V0356.R1Aligned.sortedByCoord.out.bam 86.V00750.L1Aligned.sortedByCoord.out.bam: BamReader::Open: could not open file: 123.V00796.L1Aligned.sortedByCoord.out.bam 45.V00001Aligned.sortedByCoord.out.bam 5.V00304.L1Aligned.sortedByCoord.out.bam 66.V01425Aligned.sortedByCoord.out.bam 68.V0356.R1Aligned.sortedByCoord.out.bam 86.V00750.L1Aligned.sortedByCoord.out.bam BgzfStream::Open: could not open BGZF stream: BamFile::Open: could not open file handle for 123.V00796.L1Aligned.sortedByCoord.out.bam 45.V00001Aligned.sortedByCoord.out.bam 5.V00304.L1Aligned.sortedByCoord.out.bam 66.V01425Aligned.sortedByCoord.out.bam 68.V0356.R1Aligned.sortedByCoord.out.bam 86.V00750.L1Aligned.sortedByCoord.out.bam
read outcome totals across all BAMs USED: 0 NOT_PAIRED: 0 NOT_NH_1: 0 NOT_EXPECTED_CIGAR: 0 NOT_EXPECTED_READ_LENGTH: 0 NOT_EXPECTED_STRAND: 0 EXON_NOT_MATCHED_TO_ANNOTATION: 0 JUNCTION_NOT_MATCHED_TO_ANNOTATION: 0 CLIPPED: 0 total: 0 outcomes by BAM written to: temp/2024-08-08-17_49_03_391496_read_outcomes_by_bam.txt
novel: 0.0009527206420898438 The splicing graph and candidate read have been saved into temp/2024-08-08-17_49_03391496*.rmats save: 0.0002033710479736328 Traceback (most recent call last): File "/home/juaguila/miniconda3/envs/rmats/bin/rmats.py", line 979, in
main()
File "/home/juaguila/miniconda3/envs/rmats/bin/rmats.py", line 945, in main
run_pipe(args)
File "rmatspipeline/rmatspipeline.pyx", line 4006, in rmats.rmatspipeline.run_pipe
File "rmatspipeline/rmatspipeline.pyx", line 3869, in rmats.rmatspipeline.split_sg_files_by_bam
File "rmatspipeline/rmatspipeline.pyx", line 3877, in rmats.rmatspipeline.split_sg_files_by_bam
ValueError: invalid literal for int() with base 10: '120.V02191.L1Aligned.sortedByCoord.out.bam'
I literally created a list and then copied and pasted the names to create the txt files. I don't understand why is the error.
Thank;