arefeen / TAPAS

17 stars 4 forks source link

Differential APA analysis - shortening/lengthening event analysis #7

Open SMM02 opened 4 years ago

SMM02 commented 4 years ago

Hi, I have been using your tool and have successfully used it for the differential analysis. All my files are in one directory. I have successfully used the following code to run: ./Diff_APA_site_analysis -C1 file1.txt -C2 file2.txt -a hg38RefFlat.txt -type d -o output_d.txt

However, I cannot run when I change type d to s: ./Diff_APA_site_analysis -C1 file1.txt -C2 file2.txt -a hg38RefFlat.txt -type s -o output_s.txt

I get the following error: No input file with name diff_result_for_sl.txt rm: cannot remove 'diff_result_for_sl.txt': No such file or directory

Is there something I need to change? Thanks!

SMM02 commented 4 years ago

Also, I must mention that after running with "type s", I get two output files run_sl.Rout and finding_diff_APA_site.Rout. Their contents are as follows:

run_sl.Rout: source("diff_analysis_for_sl.R")

log2FoldChangeCalculator() Error in log2FoldChangeCalculator() : object 'meanCond2' not found Execution halted

finding_diff_APA_site.Rout:

source("diff_analysis.R") locfit 1.5-9.1 2013-03-22 Warning: namespace ‘DESeq’ is not available and has been replaced by .GlobalEnv when processing object ‘scvBiasCorrectionFits’

Best, Salwa.

arefeen commented 4 years ago

Hi Salwa,

Thanks a lot for using our tool. I have reviewed the issue. Can you please try to run the shortening/lengthening analysis with multiple replicates of different types (C1 and C2)? Moreover, try to place "scvBiasCorrectionFits.rda" (provided in Differential_APA_Site_Analysis folder) file in the same directory while doing the analysis.

Thanks, Ashraful

SMM02 commented 4 years ago

Hi Ashraful,

Did you mean I need to have multiple replicates of each of the conditions for it to work? I do have the "scvBiasCorrectionFits.rda" file in the same directory.

Thanks!

Best, Salwa.

rbatorsky commented 4 years ago

Hello, I'm working with Salwa who opened this issue and we appreciate your help

I'm running the two-step pipeline with the sample data.

First:

./APA_sites_detection -ref refFlat_sf.txt -cov coverage_read_chrY.txt -l 76 -o expression_with_cp_chrY.txt

In order to test the result with differential and shortening and lengthening I made a "mock" C2 file by changing the first two lines of the output to artificially raise the last two columns which are "abundance of those APA sites" and "read count of those APA sites".

Old:

AKAP17A chrY    +   1671408 0.18241 168
ASMT    chrY    +   1711969 0.0967742   6

New:

AKAP17A chrY    +   1671408 18.241  16800
ASMT    chrY    +   1711969 9.67742 6000

I then copy these files into the "Differential_APA_Site_Analysis" folder and run the differential analysis.

./Diff_APA_site_analysis -C1 expression_with_cp_chrY.txt -C2 expression_with_cp_chrY_mockc2.txt -a refFlat_sf.txt -type d -o output_d.txt

I get the expected output:

chrY    AKAP17A +   1671408 6.64386 0   0   Y
chrY    ASMT    +   1711969 9.96578 0   0   Y

However, now when I try the shortening/lengthening analysis ( scvBiasCorrectionFits.rda is in the folder too):

./Diff_APA_site_analysis -C1 expression_with_cp_chrY.txt -C2 expression_with_cp_chrY_mockc2.txt -a refFlat_sf.txt -type s -o output_s.txt

Gives the stdout:

No input file with name diff_result_for_sl.txt
rm: cannot remove `diff_result_for_sl.txt': No such file or directory

There are several output files:

-rw-rw---- 1 rbator01 rbator01 1.5K Mar 20 11:48 diff_result.txt
-rw-rw---- 1 rbator01 rbator01 1.1K Mar 20 11:48 finding_diff_APA_site.Rout
-rw-rw---- 1 rbator01 rbator01  951 Mar 20 11:48 run_sl.Rout
-rw-rw---- 1 rbator01 rbator01    0 Mar 20 11:48 output_s.txt

The diff_result.txt starts:

chrY    AKAP17A +   1671408 0.01    6.64385618977472
chrY    ASMT    +   1711969 0.001   9.96578428466209

finding_diff_APA_site.Rout contains:

locfit 1.5-9.1   2013-03-22
Warning: namespace ‘DESeq’ is not available and has been replaced
by .GlobalEnv when processing object ‘scvBiasCorrectionFits’
> mainFunction()
> 
> proc.time()
   user  system elapsed 
  0.277   0.035   0.340 

The run_sl.Rout contains:

> source("diff_analysis_for_sl.R")
> log2FoldChangeCalculator()
Error in log2FoldChangeCalculator() : object 'meanCond2' not found
Execution halted

Then I tried your suggestion by just duplicating the same files twice:

./Diff_APA_site_analysis -C1 expression_with_cp_chrY.txt,expression_with_cp_chrY.txt -C2 expression_with_cp_chrY_mockc2.txt,expression_with_cp_chrY_mockc2.txt -a refFlat_sf.txt -type s -o output_s.txt

I no longer get the std out errors but the output_s.txt is empty.

Can you help me understand what is going on? Are multiple replicates needed? I'm also curious if you have a suggestion to make some simple test data for this step. For example, how can I modify the sample file from step 1 to get some results for the shortening/lengthening analysis?

Any insight would be greatly appreciated. Thank you again for your help and for the useful tool!

Best, Rebecca

arefeen commented 4 years ago

Hi, Thanks a lot Rebecca for using our tool (TAPAS).

From your analysis we can see that the tool needs replicates of conditions to produce output for shortening/lengthening analysis.

Why your shortening/lengthening analysis is not outputting any result? In order understand this you have to know what is shortening/lengthening (sl) analysis for alternative polyadenylation sites. Please read the manuscript of the tool (or others) to understand that. For your convenience I am giving you an informal definition: a shortening/lengthening event occurs when an APA site of a gene is shortened or lengthened. As sl is more than just changing the abundance of an APA site, the tool is not generating any output.

Thanks, Ashraful

rbatorsky commented 4 years ago

Thank you for your advice, it does make sense to use multiple replicates.

I figured out how to make a test file. I further modified the expression_with_cp_chrY_mockc2.txt file in addition to the above to shift expression from one APA to another:

The original:

CD24 chrY - 21152525,21152985 0.10379,0.0807426 189,110
CD99 chrY + 2604478,2609344 0.00488122,0.210421 14,105

The modified:

CD24 chrY - 21152525 0.2 300
CD99 chrY + 2604478 0.2 125

Then I ran like this:

./Diff_APA_site_analysis -C1 expression_with_cp_chrY.txt,expression_with_cp_chrY.txt -C2 expression_with_cp_chrY_mockc2.txt,expression_with_cp_chrY_mockc2.txt -a refFlat_sf.txt -type s -o test_short_long_output.txt

And I got the expected output in test_short_long_output.txt:

chrY    CD24    -   21152985    21152525    53.9549
chrY    CD99    +   2604478 2609344 -56.3797

Best, Rebecca