caporaso-lab / sourcetracker2

SourceTracker2
BSD 3-Clause "New" or "Revised" License
60 stars 45 forks source link

Paradoxical results #128

Closed wangpeng407 closed 4 years ago

wangpeng407 commented 4 years ago

Assuming the A1 A2 A3 ... (A) as source samples, and B as sink samples.

SourceTracker results showed ~90% of B microbiota is mainly contaminated by A community.

Specifically, OTU_1 in A contributed about 8% to B.

However, the relative abundance of OTU_1 in B is ~6%, ~1% in A, greatly smaller than in B.

So, is it reasonable in biologically meaning?

Hoping for your reply. Thanks~

johnchase commented 4 years ago

Hi @wangpengnovos,

Without seeing the data and the parameters used in running sourcetracker it is impossible to say what could be causing the result you are seeing.

Are you able to share the data you are running, and the exact command?

wangpeng407 commented 4 years ago

@johnchase

Appreciate your reply.

I shared the data, all the ST results are in data.zip.

Brief introduction please see NOTE.docx

Thanks a lot.

lkursell commented 4 years ago

@wangpengnovos - thanks for the question.

A few comments for you here, as they have been encountered before. 1) SourceTracker cannot tell you if your are making a biologically relevant comparison. Mixing proportions, in general, are based upon assumptions of physical proximity to be "meaningful" - the original SourceTracker paper in terms of physical surface contamination is a good point to this. If you compare sources and sinks from opposite sides of the world, then you are using ST more as a "similarity metric" than a mixing / "source tracking" tool. 2) I looked at the data you provided. The top abundance features in your Sinks (GA/GB/GC) which have single digit abundances are less than 1% abundance in the Source community (GY), and the most dominant GY features are minor members of the Sinks. 3) Given this, I would have expected to see more "Unknown" community contribution. However, it looks like most of the features in your Sources are also found in your sinks - which means a "mixture" is possible. The Unknown is mainly driven by features in the Sinks that are not found in the Sources, of which you have very little. 4) I would encourage you to consider both rarefaction depth and if each feature is "trustworthy" in your samples. There appears to be many very minor members in both sinks and sources. Relatedly, you've only included 1 source (GY).

Side notes: you are using the original R sourcetracker, which this repo does not maintain. Given your question is more biological / application in nature, and not code related / bug report related, I will close this issue. I would suggest you post your question to the QIIME2 forum (https://forum.qiime2.org) to seek help from the community there.