caporaso-lab / sourcetracker2

SourceTracker2
BSD 3-Clause "New" or "Revised" License
61 stars 45 forks source link

`sourcetracker2 gibbs` generating empty output tables #110

Open nick-youngblut opened 5 years ago

nick-youngblut commented 5 years ago

I'm running sourcetracker2, version 2.0.1 on my data as with the following:

sourcetracker2 gibbs \
  --jobs 1 \
  -i counts.biom \
  -m samples.txt \
  -o tmp/ST_test/

... and the output mixing_proportions.txt and mixing_proportions_stds.txt tables only have a header. There is no error or warning. It would help to at least have a warning if the output table is empty. Any idea on what's going wrong?

johnchase commented 5 years ago

Hi @nick-youngblut is it possible to share your data, or a subset of the data that reproduces the problem? I agree that a warning may be appropriate here letting the user know what is going on, although I am unsure of what the the data looks like that is creating this issue.

nick-youngblut commented 5 years ago

After some checking, it appears that when I took a random subset of my entire samples table for creating a smaller test dataset, the random subset happened to not include a 'sink' samples. It appears that SourceTracker don't check for the presence of both 'source' and 'sink' samples in the samples table. Of course, this should be a basic requirement of the user when formatting the proper input, but if this happens, it is somewhat hard to debug, since SourceTracker still generates an output (just an empty table).

nick-youngblut commented 5 years ago

Just as an FYI, I've been assessing how removing rare features from the count table will affect the classifications. It appears that if a sample no long contains any features with a count > 0 (ie., sample total count = 0), then the resulting mixing proportions for that sample are 'nan'. A user may wonder why they are getting 'nan' values for all mixing proportions of a sample. Adding a warning when importing the count table into SourceTracker may help prevent confusion about this.