arefeen / TAPAS

20 stars 4 forks source link

APA site reported in the results file #11

Open JamalEH opened 4 years ago

JamalEH commented 4 years ago

Hi dear, Could you please explain to me what is the APA columns in the differential expression file and the decision file?

I checked the coordinates reported in those column using ensembl ad they correspond to either start or end of the last exon of the gene/isoform.

Do these sites represent the binding sites of the polyadenylation machinery?

I really need your help. Thank you so much in advance! Kind regards, Jamal.

arefeen commented 4 years ago

As the name suggest it is the APA sites where the differential expression occurs. Like the previous reply, I suggest you to properly read the manuscript of the tool to understand the differential expression and shortening/lengthening analyses.

JamalEH commented 4 years ago

Hi dear, Thank you for you reply!

Of course, I have read the paper, and it was more technical explaining the principles behind the calculations of different analyses, in addition to most of it are just benchmarking with other tools.

I'm asking a biological question instead. Briefly, I'm interested in identifying possible binding factors upstream of the APA sites. Which one of the files "differential expression" or "shortening/lengthening" you would advise me to use for this analysis? I was planing to use the shortening/lengthening file but I just want to be sure hearing that from you.

Thank you!

arefeen commented 4 years ago

Hi, If you want to see the effect of binding factors, you can use both (step by step). Both analyses will tell you something about the behavior of APA sites. If you want to find all possible binding factor sites, I will suggest you to use our other tool (DeepPASTA: deep neural network based polyadenylation site analysis).

Thanks, Ashraful

JamalEH commented 4 years ago

OK great, I have performed both analyses and see clearly that my experiment is causing significant changes in the usage of APA sites, where 500 APA sites where differentially expressed between my conditions.

Thank you for the tool! I will give a try and let you know. I hope that it takes the inputs from TAPAS analysis, or something similar.

Best regards, Jamal.

JamalEH commented 4 years ago

Dear Ashraful, I hope you are fine.

I have a question regarding the log2 fold change column in the shortening/lengthening output file. Does this column correspond to the rc (i,j) ratio discussed in the manuscript? Since it represents a difference between the distal APA site (j) and the proximal (i) APA site, an rc ratio > 1 means it is a lengthening while <1 means shortening, it this interpretation right? The sign of the rc value means something?

What I did is the following using the shortening/lengthening output file: log2 fold change <1 I call this shortening event log2 fold change >1 means lengthening event. Is that correct?

Another thing I observed at the APA detection step is the following: in my control condition I have 13567 detected APA sites in one of the replicates while in my silencing condition the total number of detected APA sites is just 7071 (this is true for all the replicates of this silencing condition). Does this mean something to you? Could be due to the silencing experiment?

Thank you so much in advance! Kind regards, Jamal.

MinhuaSu commented 2 years ago

Hi JamaIEH, I have the same questions. Did you figure it out? I read the part which is relavant to this question. "Consider a pair of APA sites i and j where at least one APA site is differentially expressed and APA site i precedes APA site j on the genome. Denote the mean abundance of i and j in samples A and B as ei,A, ej,A, ei,B and ej,B, respectively. We can use the following [Equation 4] to calculate the relative change value for the APA site pair: rci,j=log2(ej,B/ej,A)−log2(ei,B/ei,A) (4) Similar to Bahn et al. (2015), if |rcij|≥ 1.0, then the APA site pair (i, j) is considered as giving rise to a shortening/lengthening event. TAPAS outputs all genes that contain APA site pairs with shortening/lengthening events."

From what I understood, the j is the distal spliced site and i is the proximal spliced site. If be B is -c2, A is -c1, and log2 fold change >1, then B samples have more reads of long spliced site than A compared with the short spliced site. Let me know if you already figured it out. Thanks! Minhua