bio-it-station / DoTA

Delta profile of Transcription factor and Alternative splicing
0 stars 0 forks source link

Criteria of positive sample & negative sample #2

Open zjin1126 opened 5 years ago

zjin1126 commented 5 years ago

Proposal

Develop an appropriate method to split the data into positive & negative groups.

Solutions

Four kinds of method would be tested.

Evaluation

  1. Amount of events remained after splitting.
  2. Plot distribution of two groups data points (PSI, Z_PSI)
  3. Histogram for tissue that remained in each groups after splitting.
  4. Estimate the percentage of spliced, retained and discarded events after filtering by each methods.
chtsai0105 commented 5 years ago

+- 1 S.D. and link to code Genes: 7437 Events: 23332 Positive counts: 12069 14.21 % Negative counts: 11263 13.26 % Discarded counts: 61618 72.53 % image image

jiahsinhuang commented 5 years ago

上面的x-axis 是(1) 挑選 psi range > 0.2 的基因(2)再經過+- 1 S.D.的篩選嗎?然後有~1300個基因只剩下一個tissue有,對嗎 ?

chtsai0105 commented 5 years ago

對,只是 filter 是 PSI range >= 0.2 從cdf可以看出有接近80%的gene都只剩下4個以下的tissue events 可能要再更寬鬆一點

zjin1126 commented 5 years ago

Quantile (20%) link to code Genes: 7437 Events: 50747 Positive counts: 26923 31.69 % Negative counts: 23824 28.04 % Discarded counts: 34203 40.26 % psi_scatterplot psi_data_points_stat

zjin1126 commented 5 years ago

Delta Z > 2 link to code Genes: 7437 Events: 74776 Discarded counts: 10174 11.98 % psi_scatterplot psi_data_points_stat

zjin1126 commented 5 years ago

Delta Z >= 2.5 link to code Genes: 7018 Events: 55430 Discarded counts: 29520 34.75 % psi_scatterplot psi_data_points_stat