Combine Datasets together

ThomasMZheng / Proteomics-BQC

All of the work done while a part of the Richards Lab will be here

0 stars 0 forks source link

Combine Datasets together #5

Closed ThomasMZheng closed 1 year ago

ThomasMZheng commented 1 year ago

Look into a naive analysis of the datasets, see if I can prove that the datasets are significantly different from one another.

Also, look into the range of all of the proteins, find outliers, maybe even create histograms of different proteins based on dataset/outcome, etc

ThomasMZheng commented 1 year ago

So it looks like this will take more work than anticipated, the datasets from a naive glance look different from each other when you compare their respective buffer and QCs, although the actual human samples overlap decently. Hopefully, after normalization, they will look more similar? I will need to normalize them anyways, although with large outliers, I will most likely need to normalize by the 0.95 percentile, not the max value.

ThomasMZheng commented 1 year ago

IL6ST Nov Jun Comparison Density -Note 93 values were removed (Mostly the buffer results which washed out the remaining density plots)

xlim(100,10000)

70 values were from 0 to 100, most likely the buffer values, 23 values were above 10000

IL6ST Nov Jun Comparison Density Scaled

The addition of ", y = ..scaled.." to the aes function of ggplot results in a more comprehensible graph

Only 23 values were omited above 10000

IL6ST Nov Jun Comparison Density Scaled

Added alpha for better visualization

ThomasMZheng commented 1 year ago

IL17RD Nov Jun Comparison Density -28 values were removed that were larger than 750, in fact we could limit to 650 and only lose 8 more values

xlim(0,7500

Here is the scaled density plot of the same protein

IL17RD Nov Jun Comparison Density Scaled

Same as above, alpha added

IL17RD Nov Jun Comparison Density Scaled

ThomasMZheng commented 1 year ago

When looking at other proteins, we notice that the distributions are not the same shape between datasets for example SIGLEC12, this makes the scaled density less informative and also means that we need to re-consider if a simple normalization is effective SIGLEC12 Nov Jun Comparison Density Rawpng

ThomasMZheng commented 1 year ago

Have not updated this in a while, had a meeting with Shidong and Lena from SomaLogic and they said that the two datasets were normalized differently which skews all of the data.

At the current state, it is impossible to merge, however they are working on it right now.

-----------------------###-----------------------

Meanwhile, I finally complied a full list of all VAP-BQC-Phenotype Map complete with days since onset, sex, and case.

Just need to wait until Lena and Shidong get back to me now.