Correlating technical replicates

yaaminiv commented 7 years ago

As @emmats suggested, I regressed my technical replicates against each other for each transition to see if some transitions were messier than others. You can see my work in my lab notebook entry.

There are definitely some transitions with lower adjusted R squared values than others. My first instinct is to establish some sort of R-squared cutoff, remove transitions lower than this cutoff, and then remake my NMDS plot. While I'm going through each transition, I can also see if there are certain outliers or leverage points that could be influencing the R-squared values (for those close to the cutoff).

Any suggestions for what that cutoff should be?

emmats commented 7 years ago

What is the range? I would go pretty high with the cut-off. Your replicates should be right on top of each other. Maybe have @laurahspencer run the same script and figure out what her range of R2 values are? Off the cuff, I would say cut-off should be at least 0.85.

yaaminiv commented 7 years ago

The range is .2 to .9 (there are examples of each in my notebook), with the majority being above 0.6.

yaaminiv commented 7 years ago

Some examples for context. Peak area from the first batch of technical replicates on the x-axis, peak area from the second batch of technical replicates on the y-axis. Points are labelled with the oyster sample ID.

bad-R

good-R

sr320 commented 7 years ago

To me it should be some defined range around a line that is has slope of 1. Thus based on replicates and not proteins

laurahspencer commented 7 years ago

I did a quick work-up using Yaamini's script. Summary data for R^2:

Mean: 0.8636
Min^: 0.6507
Max: 0.9679
Median: 0.9016 ^One peptide from Superoxide Dismutase had an awful R^2 for 2 transitions (<0.1), which were outliers.

NOTE: This wasn't using the full data set. I have 17 samples with 3 reps, and 3 samples with 4 reps; only the first 2 reps run are represented here, which likely skews things a bit (didn't want to dig too deep into modifying the code).

yaaminiv commented 7 years ago

@emmats Maybe I can start with a 0.65 R-squared cutoff. If that doesn't improve anything, work up to a 0.85 cutoff?

@sr320 can you elaborate on your suggestion? From what I understand, I would plot x = y line in addition to a linear regression, and then consolidate the two somehow?

sr320 commented 7 years ago

Lets discuss in class On Wed, Oct 11, 2017 at 9:03 PM Yaamini Venkataraman < notifications@github.com> wrote:

Maybe I can start with a 0.65 R-squared cutoff. If that doesn't improve anything, work up to a 0.85 cutoff?

@sr320 https://github.com/sr320 can you elaborate on your suggestion? From what I understand, I would plot x = y line in addition to a linear regression, and then consolidate the two somehow?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/RobertsLab/project-oyster-oa/issues/18#issuecomment-336014689, or mute the thread https://github.com/notifications/unsubscribe-auth/AEPHt2qWGyLfhZMOFYkI-InQ8CH_UE2tks5srY93gaJpZM4P13Q- .

emmats commented 7 years ago

I think the 0.6 cut-off sounds safe. But, of course, @sr320 makes the final call.

This is pretty informative for me. I've never done this before.

yaaminiv commented 7 years ago

@emmats Here's my plan:

Normalize all my values by TIC to reduce any external variation (the plots I've made so far are not normalized...what are your thoughts on this step?)
Use a 0.6 cutoff and discard any transitions with an adjusted R-squared value below this. Remake an NMDS plot and examine clustering

AT THE SAME TIME...

Plot an x = y line on each plot, as well as a 95% confidence interval. @sr320 and I discussed the value of this during class. Since our technical replicates should have the same protein abundances, we expect the best fit model to be a 1:1 ratio.
Discard transitions with less than 95% of the points (43 points) within the condense interval. Remake an NMDS plot and examine clustering

Thoughts?

emmats commented 7 years ago

I think that sounds good. I don't think you need to normalize by TICs. If your TICs vary widely between technical replicates, then you have other problems.

yaaminiv commented 7 years ago

@emmats @sr320 [Notebook]()

I went through the first part of my plan and used R-squared cutoffs to eliminate transitions and remake NMDS plots. I used a combination of three cutoffs (0.6, 0.7 and 0.8) and normalized/nonnormalized data. I found normalizing made my plots look a little better. Overall this helped a bit, but the technical replication still doesn't look fantastic.

0.6, normalized:

0.6-normalized-NMDS

0.6-normalized-distances

0.7, normalized:

0.7-normalized-NMDS

0.7-normalized-distances

0.8, normalized:

0.8-normalized-NMDS

0.8-normalized-distances

I'll try the second part soon, but it may take me a bit longer since making a confidence interval around a line in a for loop is a bit more tedious. Any thoughts about these results?

sr320 commented 7 years ago

I would not necessarily expect r2 threshold to improve reps- eg you good have a r2 of 1 and slope could be 0.

On Sun, Oct 15, 2017 at 2:21 AM Yaamini Venkataraman < notifications@github.com> wrote:

@emmats https://github.com/emmats @sr320 https://github.com/sr320 Notebook

I went through the first part of my plan and used R-squared cutoffs to eliminate transitions and remake NMDS plots. I used a combination of three cutoffs (0.6, 0.7 and 0.8) and normalized/nonnormalized data. I found normalizing made my plots look a little better. Overall this helped a bit, but the technical replication still doesn't look fantastic.

0.6, normalized:

[image: 0.6-normalized-NMDS] https://raw.githubusercontent.com/RobertsLab/project-oyster-oa/master/analyses/DNR_SRM_20170902/2017-10-10-Troubleshooting/2017-10-10-Transition-Replicate-Correlations/2017-10-13-NMDS-TechnicalReplication-Normalized-Cutoff1.jpeg

[image: 0.6-normalized-distances] https://raw.githubusercontent.com/RobertsLab/project-oyster-oa/master/analyses/DNR_SRM_20170902/2017-10-10-Troubleshooting/2017-10-10-Transition-Replicate-Correlations/2017-10-13-NMDS-TechnicalReplication-Ordination-Distances-Normalized-Cutoff1.jpeg

0.7, normalized:

[image: 0.7-normalized-NMDS] https://raw.githubusercontent.com/RobertsLab/project-oyster-oa/master/analyses/DNR_SRM_20170902/2017-10-10-Troubleshooting/2017-10-10-Transition-Replicate-Correlations/2017-10-13-NMDS-TechnicalReplication-Normalized-Cutoff2.jpeg

[image: 0.7-normalized-distances] https://raw.githubusercontent.com/RobertsLab/project-oyster-oa/master/analyses/DNR_SRM_20170902/2017-10-10-Troubleshooting/2017-10-10-Transition-Replicate-Correlations/2017-10-13-NMDS-TechnicalReplication-Ordination-Distances-Normalized-Cutoff2.jpeg

0.8, normalized:

[image: 0.8-normalized-NMDS] https://raw.githubusercontent.com/RobertsLab/project-oyster-oa/master/analyses/DNR_SRM_20170902/2017-10-10-Troubleshooting/2017-10-10-Transition-Replicate-Correlations/2017-10-13-NMDS-TechnicalReplication-Normalized-Cutoff3.jpeg

[image: 0.8-normalized-distances] https://raw.githubusercontent.com/RobertsLab/project-oyster-oa/master/analyses/DNR_SRM_20170902/2017-10-10-Troubleshooting/2017-10-10-Transition-Replicate-Correlations/2017-10-13-NMDS-TechnicalReplication-Ordination-Distances-Normalized-Cutoff3.jpeg

I'll try the second part soon, but it may take me a bit longer since making a confidence interval around a line in a for loop is a bit more tedious. Any thoughts about these results?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/RobertsLab/project-oyster-oa/issues/18#issuecomment-336676791, or mute the thread https://github.com/notifications/unsubscribe-auth/AEPHt7rlAzK1JJhltI3Q2hAgXVT61D1dks5ssU_wgaJpZM4P13Q- .

emmats commented 7 years ago

I'm still pretty suspicious of these data. It just doesn't make sense that the technical replicates don't look the same.

sr320 commented 7 years ago

@yaaminiv can you please provide a csv with respective technical in adjacent columns?

yaaminiv commented 7 years ago

@sr320 csv

I just tried playing with slopes and confidence intervals. I'm going to try one more thing on that front and then write it up in a lab nb post/possibly post a new issue

yaaminiv commented 7 years ago

Notebook

I was following @sr320 suggestion to look at slopes and plot a 95% confidence interval around an x = y line. Ran into some issues doing that (more details in my nb), so I can only really plot an x = y line and a prediction line (same intercept as regression, but a slope of 1) along with my data.

example

Any suggestions for how to move forward? A few of my issues are that there are large intercepts for the regression, so an x = y line is far removed and creating a confidence interval around an x = y line/prediction line is essentially impossible with my skill set because neither or those have any error (so plotting a CI would just lead to an upper and lower bound falling directly on top of the original line). I could look at the slope of the original regression and if it falls within some cutoff (1 ± some undetermined error value), I remove the transition and remake an NMDS?

Thoughts? (esp from @emmats since you think this data is suspicious?) I'm stumped, and the only thing I think may work now might be rerunning samples (but I don't know how possible that is)...

yaaminiv commented 7 years ago

There are also transitions that have poor R squared values but slopes close to 1. What should I do about those?

choyp_psa 1 1 m 27259 yfqiayplpk y4 confint

sr320 commented 7 years ago

Provide the new data sheet I mentioned above and I can provide feedback On Tue, Oct 24, 2017 at 11:34 AM Yaamini Venkataraman < notifications@github.com> wrote:

There are also transitions that have poor R squared values but slopes close to 1. What should I do about those?

[image: choyp_psa 1 1 m 27259 yfqiayplpk y4 confint] https://user-images.githubusercontent.com/22335838/31961321-3d222c32-b8af-11e7-9735-19268b06278b.jpeg

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/RobertsLab/project-oyster-oa/issues/18#issuecomment-339088567, or mute the thread https://github.com/notifications/unsubscribe-auth/AEPHt0JfoI5q6ARUS_PR984X8O63jlfKks5svi2zgaJpZM4P13Q- .

yaaminiv commented 7 years ago

@sr320 The one with tech reps in adjacent columns? I linked you to that!

sr320 commented 7 years ago

Sorry - I need it in just two columns with the sample IDs in a column...

yaaminiv commented 7 years ago

@sr320 I think I'm confused...so sample IDs in one column, transitions in another column?

sr320 commented 7 years ago

Col1-transition | Col2-sampleID | Col3-rep1 | Col4-rep2

sr320 commented 7 years ago

Column1 and Column2 could be switched....

yaaminiv commented 7 years ago

normalized or not normalized?

sr320 commented 7 years ago

How about both...

yaaminiv commented 7 years ago

Normalized Not normalized

sr320 commented 7 years ago

Use this data to start making graphs - simply average reps.

http://d.pr/f/OtdTD

This is just the normalized data with coefficient of variance less than 20.

yaaminiv commented 7 years ago

Notebook

Used CV filtering to redo NMDS/ANOSIM analyses. Slight improvement in technical replication, ANOSIM/NMDS indicates no significant clustering pattern.

Going to filter data with CV ≤ 10 and repeat. Will also look at expression of individual proteins making boxplots, etc. Interested in your thoughts @emmats.

yaaminiv commented 7 years ago

Seeing how we've answered my original question, I'm going to continue the current conversation in #35.

RobertsLab / project-oyster-oa

Correlating technical replicates #18