datascience-mobi-2023 / topic05_team03

0 stars 0 forks source link

Verify correlation calculation #7

Closed aluisascosta closed 1 year ago

aluisascosta commented 1 year ago

https://github.com/datascience-mobi-2023/topic05_team03/blob/0c421cff35df7bb9418d791b851e2fd71fb32c64/correlation_drug_geneknockout.Rmd#L180

aluisascosta commented 1 year ago

@artankry Can you tell me more about the issue here? I am struggling to understand the aim of the code chunk and I also cannot reproduce it because I am missing the value highly_correlated_drugs.

artankry commented 1 year ago

Hey Ana,

The issue is the low correlation values for the correlations between drug sensitivity and gene knockout. By the way, all the correlations (for example the correlations of prism scores and prism.achilles scores in line 110 to 152) are that low.

The highly correlated drugs you can find one chunk above line 180. The drugs that had a high correlation with at least one gene knockout were selected.

In general I designed the functions for the correlation matrix as follows:

  1. Iterate through the columns of the two datasets
  2. Select the rows with NAs 3.Remove the same rows with NAs in both columns
  3. Calculate correlation with spearman

Respectfully yours Artan

Am 05.07.2023 um 16:35 schrieb Ana Luisa S. Costa @.***>:

@artankry https://github.com/artankry Can you tell me more about the issue here? I am struggling to understand the aim of the code chunk and I also cannot reproduce it because I am missing the value highly_correlated_drugs.

— Reply to this email directly, view it on GitHub https://github.com/datascience-mobi-2023/topic05_team03/issues/7#issuecomment-1621892930, or unsubscribe https://github.com/notifications/unsubscribe-auth/A7YJES5WFSHA2W5UZAZTGETXOV3S5ANCNFSM6AAAAAAZ6S2EMI. You are receiving this because you were mentioned.

aluisascosta commented 1 year ago

Hi @artankry,

On point 2 and 3 I would suggest that if you have a small amount of NAs (e.g more than 2) you can replace them with the mean maybe. But if all are NAs, you can simply remove it.

You have a chunk that it took me forever to run and I am guessing you are running the correlation on more than you think you are. If this alright with you, you can keep doing it, but just in case, check if you are selecting just gene X and gene X in the 2 different data frames, as in the end, you should get just one single value of correlation per gene.

Finally, maybe the low correlation values are normal. As long as it seems to be more or less consistent between all genes, this should be alright. How are you handling treatment which have more than one target?

artankry commented 1 year ago

Hey Ana, I am sorry for not replying. I did not see the email. I do not include the drugs with multiple targets in the analysis, as a we have no data on how multiple gen knockouts at the same time do affect the cell.