Xinglab / rmats-turbo

Other
217 stars 53 forks source link

AS events of a single sample #418

Open biochristmas opened 1 month ago

biochristmas commented 1 month ago

Hi, If I only need to obtain the AS count for a single sample (experimental group or control group), I execute this command: rmats.py --b1 b2.txt --gtf isoquant_gffcompare.gtf -t paired --bi ./STAR_binary_index --readLength 150 --nthread 30 --od ./test1 --libType fr-unstranded --statoff --tmp ./test1/tmp_output. The AS count in the resulting summary.txt file represents the number of alternative splicing events for that sample? summary

EricKutschera commented 1 month ago

The TotalEvents columns in the summary file are the total number of events in the MATS output files: https://github.com/Xinglab/rmats-turbo/issues/257#issuecomment-1419471666

Since you only have --b1 without --b2 the events are not filtered and could be from the --gtf without supporting reads: https://groups.google.com/g/rmats-user-group/c/SVlOJs4CwVk/m/IIzvDAXrBgAJ

You could get an AS event count by filtering on the read count (IJC_SAMPLE_1, SJC_SAMPLE_1) and PSI (IncLevel1) columns: https://github.com/Xinglab/rmats-turbo/issues/183#issuecomment-1054318031

biochristmas commented 1 month ago

Hi, @EricKutschera I will filter based on Average PSI (IncLevel) within 0.05 and 0.95. I have three replicates, as an example (first row of the image), Average PSI (IncLevel) = (NA+1.0+0.0)/3. However, there are NA values present. How should I handle this。 SE MATS JC

EricKutschera commented 1 month ago

You can try different methods for handling NA. One option is to filter out events that have any NA value in IncLevel1. It looks like only a few rows in the screenshot have NA IncLevel1. You could also keep events with only 1 NA value in IncLevel1 and average the remaining values. In the example you mentioned you could use (1 + 0)/2=0.5 as the average PSI

biochristmas commented 1 month ago

Hi, @EricKutschera I have another question. IncLevel1 = SJC_SAMPLE_1 / (SJC_SAMPLE_1 + IJC_SAMPLE_2). For example, in the second row of the image, the value for the third replicate should be 0.06: 2/(2+31)=0.06. However, IncLevel1 = 0.037. SE MATS JC_2

EricKutschera commented 1 month ago

IncLevel1 is (IJC_SAMPLE_1/IncFormLen) / ((IJC_SAMPLE_1/IncFormLen) + (SJC_SAMPLE_1/SkipFormLen)) : https://github.com/Xinglab/rmats-turbo/issues/349

Your screenshot doesn't show the IncFormLen and SkipFormLen columns for that event, but based on the coordinates and --readLength 150 they should be 252 and 149. Then IncLevel=(2/252)/((2/252)+(31/149)) = 0.0367

biochristmas commented 1 month ago

@EricKutschera Thank you for your prompt response. The issue has been resolved. Best regards and good luck!

EricKutschera commented 1 month ago

Yes rMATS can detect events just from the GTF file. See this comment which has a link to an empty bam file: https://github.com/Xinglab/rmats-turbo/issues/79#issuecomment-764704152

You could find the set of annotated events by running rmats with a GTF file and with a mostly empty input

Based on the screenshot I think rMATS will only detect 2 RI events. See this post for details about the intron detection code: https://github.com/Xinglab/rmats-turbo/issues/17