Xinglab / rmats-turbo

Other
219 stars 53 forks source link

Is Single end RNAseq can use this version ? #343

Open Portulaca666 opened 10 months ago

Portulaca666 commented 10 months ago

OK . Thanks for great version turbo! My RNA-seq reads is from this Project —— PRJNA454873 I am not sure the single-end data can use your new turbo version .

Please help me !

EricKutschera commented 10 months ago

You can run rMATS with -t single. Here's the description from the README: https://github.com/Xinglab/rmats-turbo/tree/v4.2.0#all-arguments

-t {paired,single}    Type of read used in the analysis: either "paired" for
                      paired-end data or "single" for single-end data.
                      Default: paired

See this post for a possible solution if some samples are paired-end and others are single-end: https://github.com/Xinglab/rmats-turbo/issues/180#issuecomment-1048064842

Portulaca666 commented 10 months ago

Hi ! I try to run in this way .this is my code `s1_path=/home/xjhuang_pkuhpc/gpfs1/help/LQQ/clean/trimed_fastq/ss1.fastq.csv

s2_path=/home/xjhuang_pkuhpc/gpfs1/help/LQQ/clean/trimed_fastq/ss2.fastq.csv

source /appsnew/source/rmats_turbo_v4.2.0.sh python /appsnew/usr/python/Miniconda/miniconda3-py311_23.5.2/envs/rmats_turbo_v4_2_0/rmats.py -t single --s1 ${s1_path} --s2 ${s2_path} --gtf /home/xjhuang_pkuhpc/lustre2/genome/NCBI/hg38/hg38.ncbiRefSeq.gtf --bi /home/xjhuang_pkuhpc/lustre2/genome/STAR/hg38 --readLength 50 --nthread 18 --od /home/xjhuang_pkuhpc/gpfs1/help/LQQ/rmats/output --tmp /home/xjhuang_pkuhpc/gpfs1/help/LQQ/rmats/tmp_output `

this is the output :

I have no idea why not any result in several important files . gtf: 95.98217487335205 There are 59251 distinct gene ID in the gtf file There are 207289 distinct transcript ID in the gtf file There are 35878 one-transcript genes in the gtf file There are 2214932 exons in the gtf file There are 23283 one-exon transcripts in the gtf file There are 18592 one-transcript genes with only one exon in the transcript Average number of transcripts per gene is 3.498489 Average number of exons per transcript is 10.685237 Average number of exons per transcript excluding one-exon tx is 11.910747 Average number of gene per geneGroup is 1144.576891 statistic: 0.03255939483642578

read outcome totals across all BAMs USED: 12033 NOT_PAIRED: 0 NOT_NH_1: 420853115 NOT_EXPECTED_CIGAR: 4252131 NOT_EXPECTED_READ_LENGTH: 349347110 NOT_EXPECTED_STRAND: 0 EXON_NOT_MATCHED_TO_ANNOTATION: 1720 JUNCTION_NOT_MATCHED_TO_ANNOTATION: 320 CLIPPED: 0 total: 774466429 outcomes by BAM written to: /home/xjhuang_pkuhpc/gpfs1/help/LQQ/rmats/tmp_output/2023-11-17-22_02_32_539617_read_outcomes_by_bam.txt

novel: 288.21173310279846 The splicing graph and candidate read have been saved into /home/xjhuang_pkuhpc/gpfs1/help/LQQ/rmats/tmp_output/2023-11-17-22_02_32539617*.rmats save: 0.0213625431060791 loadsg: 0.00998997688293457

========== Done processing each gene from dictionary to compile AS events Found 44611 exon skipping events Found 5997 exon MX events Found 30223 alt SS events There are 15143 alt 3 SS events and 15080 alt 5 SS events. Found 6363 RI events

ase: 1.552682876586914 count: 1.160879373550415 Processing count files. Done processing count files.

Portulaca666 commented 10 months ago

When I see each output .I found the USED reads is so little image What should I do ?

EricKutschera commented 10 months ago

Almost all of the alignments are either NOT_NH_1 or NOT_EXPECTED_READ_LENGTH:

NOT_NH_1: 420853115 (54%)
NOT_EXPECTED_READ_LENGTH: 349347110 (45%)
total: 774466429

The command you posted had: --readLength 50. Maybe the reads are not actually length 50. If the reads are not expected to be the same length you can use --variable-read-length: https://github.com/Xinglab/rmats-turbo/issues/83#issuecomment-766845772