unbalanced data - Githubissues

caporaso-lab / sourcetracker2

SourceTracker2

BSD 3-Clause "New" or "Revised" License

60 stars 45 forks source link

Is it appropriate to use SourceTracker with unbalanced or missing data? I have 5 sources for my 1 sink. I collected n=52 for each sample type (52*6=312); however, some samples were discarded because they had <1000 sequences after filtering (the min. number needed to be confident we've adequately sampled the community per rarefaction plotting). Consequently, I have a total of n=289 samples (n=52 for sink, and a range of n=42-50 for sources)--and a total of n=42 "complete sets" (i.e., for each SampleID, we have data available for all 5 sources and the 1 sink).

I ran sourcetracker2 on the n=289 samples. Now wondering if the algorithm is sensitive to this unbalanced and missing data. Would you recommend running it only for "complete sets?"

caporaso-lab / sourcetracker2

unbalanced data #116