Open naarkhoo opened 8 years ago
This is a great platform to ask this type of question! The Zackular data are a subset of the Baxter dataset.
May I ask, if you have tried too pool these two data sets or use one as an external validation set to test the performance of the predictor model ?
The Baxter dataset includes the Zackular dataset. Since the Zackular dataset was something of a random subset from the larger dataset and had about 20% of the samples, we opted for the leave-one-out cross validation. We felt that using 80% of the data to train on a single random subset containing 20% of the data would have had a negative bias on the evaluation of the model.
I have made a PCoA of these two, trying to explain, what could be the source of variation. May I ask if those samples are re-sequenced, or it was two different sequencing runs ?
Because, I don't know which samples in the Baxter datasets are the Zackular samples - I was hoping the PCoA plot, could help me to see this overlap.
Everything was resequenced
The overlapping samples in the Baxter dataset are from the same patients' stool samples, but they were different aliquots of those stool samples. So they were re-extracted and re-sequenced.
It is an interesting data set by its own and shows which predictive taxa potentially is sensitive and alters.
If you want the Zackular data on it's own, you can download the data and metadata from here: http://mothur.org/MicrobiomeBiomarkerCRC/
I am not sure, if here is a right platform to ask this question. But I was wonder, how much the data from your lab (Zackular, et al 2014) is comparable to this cohort; Thanks again for making your papers and codes open.