caporaso-lab / student-microbiome-project

Central repository for data and analysis tools for the StudentMicrobiomeProject.
9 stars 3 forks source link

Highlight intrapersonal variation #33

Open floresg opened 11 years ago

floresg commented 11 years ago

I have a request for some code. Basically, I want some way to highlight intrapersonal variation through time. I know the within versus between unifrac distances says that variability within an individual is less than variability between people but it does not speak to the variation within a person. So, what I want to do is determine the number of otu's shared between any two time points for an individual and then continue for any 3 time points, any 4 time points and so on. It is kind of like the issue you deleted the other day. For example, lets look at the forehead of one individual and ask how many otu's are shared between week 0 and week 1 samples? How many between week 0 and week 2? How many between week 0 and week 3? And so on for each two week comparison. We would then average (or median) that for the individual to get a number of the average (or median) otu's shared between two samples. You would do this for each person. Next, we look at otu's shared between 3 weeks worth of samples: week 0, week 1, and week 2? Week 0, week 1, and week 3? And so on. Then average for each indidivual and move on to 4 weeks worth of samples. In the end I want to be able to say, "On average, an indidivudal shared XX number of otus across any two time points highlighting the intrapersonal temporal variability of the human microbiome." Or soemthing like that. The reason I want the number of otus and not the UniFrac distance is UniFrac is not easliy interpreted by people outside of microbial ecology and if we are aiming for Naute or Science this is much easier to understand. Another aspect of this analysis I want to incorporate is doing it a various otu filtering strategies – I know if we use the whole otu table we will have an overall low fraction of otus shared between any two time points because of the long tail. So, for each indidivual I want to do this for the top 100 otus (or a fixed percentage for each body habitat; top 5%)– not the top 100 across individuals, the top 100 for each individual. The way I invision this happening is taking the otu time series tables for each body habitat and generating per individual otu tables, ranking each otu table, removing low abundance otus (less than top 100 or other filtering strategy), and then determining the otus shared at the various time intervals. In the end, we should have some kind of inverse logarithmic response where more otus are shared when less weeks of samples are compared. Let me know if this makes sense and if you think it is worthwhile. Thanks.

rob-knight commented 11 years ago

Given the issues surrounding determination of OTUs using next-gen sequencing data, which are well-known to reviewers, and the success of papers based on UniFrac in Nature/Science in my experience, I think this is a very risky approach...

Rob

On Apr 1, 2013, at 10:35 AM, floresg notifications@github.com<mailto:notifications@github.com> wrote:

I have a request for some code. Basically, I want some way to highlight intrapersonal variation through time. I know the within versus between unifrac distances says that variability within an individual is less than variability between people but it does not speak to the variation within a person. So, what I want to do is determine the number of otu's shared between any two time points for an individual and then continue for any 3 time points, any 4 time points and so on. It is kind of like the issue you deleted the other day. For example, lets look at the forehead of one individual and ask how many otu's are shared between week 0 and week 1 samples? How many between week 0 and week 2? How many between week 0 and week 3? And so on for each two week comparison. We would then average (or median) that for the individual to get a number of the average (or median) otu's shared between two samples. You would do this for each person. Next, we look at otu's shared between 3 weeks worth of samples: week 0, week 1, and week 2? Week 0, week 1, and week 3? And so on. Then average for each indidivual and move on to 4 weeks worth of samples. In the end I want to be able to say, "On average, an indidivudal shared XX number of otus across any two time points highlighting the intrapersonal temporal variability of the human microbiome." Or soemthing like that. The reason I want the number of otus and not the UniFrac distance is UniFrac is not easliy interpreted by people outside of microbial ecology and if we are aiming for Naute or Science this is much easier to understand. Another aspect of this analysis I want to incorporate is doing it a various otu filtering strategies – I know if we use the whole otu table we will have an overall low fraction of otus shared between any two time points because of the long tail. So, for each indidivual I want to do this for the top 100 otus (or a fixed percentage for each body habitat; top 5%)– not the top 100 across individuals, the top 100 for each individual. The way I invision this happening is taking the otu time series tables for each body habitat and generating per individual otu tables, ranking each otu table, removing low abundance otus (less than top 100 or other filtering strategy), and then determining the otus shared at the various time intervals. In the end, we should have some kind of inverse logarithmic response where more otus are shared when less weeks of samples are compared. Let me know if this makes sense and if you think it is worthwhile. Thanks.

— Reply to this email directly or view it on GitHubhttps://github.com/gregcaporaso/student-microbiome-project/issues/33.

floresg commented 11 years ago

I have added some results related to this issue see this link:

https://github.com/gregcaporaso/student-microbiome-project/tree/master/analysis-results/TimeSeriesResults/OTUs_Shared_over_time

We can discuss this at our meeting tomorrow.