Closed ElDeveloper closed 11 years ago
For the interpolations we have to options, or better said we can work on two levels:
Also Alex Washburn (a recent lab visitor) is working models to be able to interpolate at the OTU level which in the end would allow us to interpolate on the community level. The only "issues" he mentioned regarding using this, is the fact that all the code is currently written in MATLAB and that it still needs some work.
I will ask Will/Justin about the code as I don't know about it.
Great idea, this could be good dataset to point him at to see if techniques are valuable before recoding/integrating?
On Nov 18, 2012, at 11:15 AM, "Yoshiki" notifications@github.com<mailto:notifications@github.com> wrote:
Also Alex Washburn (a recent lab visitor) is working models to be able to interpolate at the OTU level which in the end would allow us to interpolate on the community level. The only "issues" he mentioned regarding using this, is the fact that all the code is currently written in MATLAB and that it still needs some work.
I will ask Will/Justin about the code as I don't know about it.
— Reply to this email directly or view it on GitHubhttps://github.com/gregcaporaso/student-microbiome-project/issues/14#issuecomment-10489184.
Thanks for getting this started Yoshiki. I'm going to assign you to this issue.
The beta-diversity plots were added in 31fcc30863adba3f16c9caab5388a55299870e67
Interpolation
I have been working on first solving the interpolation/resampling issue presented, this in turn will: make this data set compatible with the frequency-clustering method and will hopefully create a script that I have seen a couple users shown interest in.
I have opted for the OTU level resampling. To do so, the general steps that I have sketched are very similar to what's done with multiple rarefactions:
This processing is very similar to what others do with microarray data. Though using "multiple rarefactions" here, which is not something that I have seen an equivalent in micrarray data analysis.
I would really appreciate if people could make make some suggestions regarding this. I have some work done, though I still have a good part to go (specifically the add/remove samples turned out to be kinda problematic).
Clustering
I will try to summarize what the general steps are in this clustering procedure, if you would like to get a more in-depth explanation of the algorithm, please see section II
of this file.
Take as an input a rarefied and evenly spaced in time OTU table.
The outpus is a list, where each element is a group of OTUs that resulted to be very similar according to their frequency characteristics. What this means is that you are capable of finding time-lagged related OTUs in the same group, or even non-time-lagged related OTUs in the same group.
This same algorithm can be used with PCoA plots instead of OTU tables, though the general limitation is an even-sampling depth, hence the first section of this document.
† Sampling period: time between samples in the mapping file.
†† Ideally the final algorithm will not be limited to use a specific interpolation method and should be capable of using distinct methods.
+1
This looks great -- thanks, Yoshiki!
Rob
On Jan 22, 2013, at 12:48 AM, Yoshiki notifications@github.com<mailto:notifications@github.com> wrote:
Interpolation
I have been working on first solving the interpolation/resampling issue presented, this in turn will: make this data set compatible with the frequency-clustering method and will hopefully create a script that I have seen a couple users shown interest in.
I have opted for the OTU level resampling. To do so, the general steps that I have sketched are very similar to what's done with multiple rarefactions:
This processing is very similar to what others do with microarray data. Though using "multiple rarefactions" here, which is not something that I have seen an equivalent in micrarray data analysis.
I would really appreciate if people could make make some suggestions regarding this. I have some work done, though I still have a good part to go (specifically the add/remove samples turned out to be kinda problematic).
Clustering
I will try to summarize what the general steps are in this clustering procedure, if you would like to get a more in-depth explanation of the algorithm, please see section II of this filehttp://cl.ly/3J3m1B1Z0a2T.
Take as an input a rarefied and evenly spaced in time OTU table.
The outpus is a list, where each element is a group of OTUs that resulted to be very similar according to their frequency characteristics. What this means is that you are capable of finding time-lagged related OTUs in the same group, or even non-time-lagged related OTUs in the same group.
This same algorithm can be used with PCoA plots instead of OTU tables, though the general limitation is an even-sampling depth, hence the first section of this document.
† Sampling period: time between samples in the mapping file.
†† Ideally the final algorithm will not be limited to use a specific interpolation method and should be capable of using distinct methods.
— Reply to this email directly or view it on GitHubhttps://github.com/gregcaporaso/student-microbiome-project/issues/14#issuecomment-12533831.
Yes, this sounds really interesting. What is the issue that you're having with adding/removing samples?
One of the key problems that I really didn't consider at a first glance and that @antgonza pointed was that: we currently don't know what an ideal sampling period for this type of data is. This matters a lot, because the result of this data imputation procedure (as it is) would be very biased and at the end of the day not really helpful. Another thing that was not being considered (that would also be really helpful), was taking into account similar treatments, and their development over time to be able to make imputations of the missing points for subjects within the same treatment, this I think would be an awesome thing to integrate.
Additionally, the previously outlined steps changed and this now does not seem like something that will be ready in time as I know the intention here is to start writing up the manuscript sometime soon.
Nonetheless, we also agreed that this dataset could be useful for method testing once we have this in place, which is out of the scope of this paper.
I will still proceed with the clustering, although not with all the subjects.
Could you subsample from the Moving Pictures data set to identify a useful sampling period (even if it's not ideal)?
Yes that does sound like a good idea, this was also pointed by @antgonza.
I've added a couple files to Issue_14
. @antgonza and I are going to work on the google doc and hopefully we will have it ready by tomorrow.
Thanks! Really excited to work on this.
Sorry, we sent out this document to the bioinfo list but I'm not sure if everyone in this project gets that info.
An explanation of the volatility analysis can be found here.
@ElDeveloper has not been getting interesting results with the approaches described here, and is going to focus on #31.
Assess similarities between different time trajectories across body-sites or even across individuals.
The idea here is to be able to find signals, where the
x-axis
is time and they/z-axes
are PC1/2. This same procedure could be used at the OTU level with time and abundance.To address this issue interpolation may need to be solved, to have more dense signals.