cozygene / FEAST

Fast expectation maximization for microbial source tracking
Other
115 stars 60 forks source link

Changes in contribution based on number of sources #12

Open BakDK opened 4 years ago

BakDK commented 4 years ago

Hi

I have just installed FEAST and run a few analyses. At first glance, the results looked good, but then I started to play around with number of sources to see how sensitive the analysis is. This really changed the outcome, some of the changes were not surprisingly while others puzzle me a bit.

First, I had 3 sinks, with 3 potential sources assigned to each of them. In addition there were 4 sources with no ID assigned.

In the output, the sources with no ID had "NA" assigned to them. Hence, they did not contribute to the sink. First question: Is there a way where I can add samples that could be a source for each of the sinks?

An example: I have a big pile of soil that I subsample and distribute into different pots. Here, the soil can be a source for each of the pots. Can I add the same sample 3 times (as I have 3 sinks) and then assign them with three different ID's and 3 unique sample names?

Secondly, I then tried to assign ID to the 4 sources that did not have an ID in the first place. They now had the same ID, leaving 2 out of 3 sinks with no "no-ID" assigned sources (but keeping the original potential sources!).

This changed the contribution of the three potential sources(the ones I started with) quite dramatically for all 3 sinks. In the first scenario (4 sources without ID) one source could contribute with 58% while in the second scenario (no sources without ID as they were assigned to another sink), this source now contributed ~100% to the sink.

So even though the 4 sources without ID had NA assigned to them in the output file in the first scenario, they were still used for calculations? I am sure it just the underlying math that I have a hard time understanding, so can you try to elaborate a bit on this?

Thanks

kruehling commented 1 year ago

Hello,

I'm wondering if you ever got this resolved? Clearly a while ago, but if you came to any conclusion I'd be interested to hear about it. I have a somewhat similar issue, I've run a slightly larger dataset with fecal samples as sources to water samples (sinks). I noticed that one sewage sample was getting high source values. Skeptical, I tried rerunning the code without that sample and a nearly identical source value is now attributed to a goose fecal sample- which was not a major source when the sewage sample in question was included? This difference seems particularly curious when there are other sewage samples that appear to have a more similar bacterial composition to the one that was left out- yet the program seemingly selects a goose sample to replace the source void left by the removed sewage sample? I am also assuming this can be attributed to some math within the function that is lost to me.. But would love any help thinking about this issue. Thanks !

BakDK commented 1 year ago

Hi Never got it resolved and moved to another tool. So I am sorry, but I cannot help you on this.