Open sachasuca opened 5 years ago
Hi @sachasuca - the general QIIME2 forum might be a good place to ask that question, since it is an applied question rather than strictly code-based.
I would run ST a few ways. 1) Use individual samples from each source as a source and then sum the results on a per source basis 2) Use all data available from sources after your QC filtering 3) Randomly sample to the minimum sample number from your sources - in this case, grab a random set of 42 samples from each source
Is it appropriate to use SourceTracker with unbalanced or missing data? I have 5 sources for my 1 sink. I collected n=52 for each sample type (52*6=312); however, some samples were discarded because they had <1000 sequences after filtering (the min. number needed to be confident we've adequately sampled the community per rarefaction plotting). Consequently, I have a total of n=289 samples (n=52 for sink, and a range of n=42-50 for sources)--and a total of n=42 "complete sets" (i.e., for each SampleID, we have data available for all 5 sources and the 1 sink).
I ran sourcetracker2 on the n=289 samples. Now wondering if the algorithm is sensitive to this unbalanced and missing data. Would you recommend running it only for "complete sets?"