caporaso-lab / sourcetracker2

SourceTracker2
BSD 3-Clause "New" or "Revised" License
60 stars 45 forks source link

Leave one out error #121

Closed byroncrump closed 5 years ago

byroncrump commented 5 years ago

Sourcetracker2 version 2.0.1 is working well for me except for the Leave One Out option to assess my source groupings. I use the same map and .biom files I use for standard sourcetracker2 gibbs runs, but I get an error that ends with the lines:

File "mtrand.pyx", line 1144, in mtrand.RandomState.choice ValueError: probabilities are not non-negative

My line command is this sourcetracker2 gibbs -i table-with-taxonomy_iseries_freq5.biom -m map_7sources.txt --loo -o loo_results

johnchase commented 5 years ago

Hi @byroncrump thanks for posting this. Will you respond with the operating system that you are using, and then full traceback of the error? Also if it is possible can you share small example of the data that is causing this issue?

byroncrump commented 5 years ago

I'm using a unix operating system. I ran this using 10 CPUs on a system with 512 Gb RAM

(st2) [crumpb@yukon1 sourcetracker]$ sourcetracker2 gibbs -i table-with-taxonomy_iseries_freq5.biom -m map_7sources.txt --loo -o loo_results/ Traceback (most recent call last): File "/home/pi/crumpb/.conda/envs/st2/bin/sourcetracker2", line 11, in sys.exit(cli()) File "/home/pi/crumpb/.conda/envs/st2/lib/python3.5/site-packages/click/core.py", line 722, in call return self.main(args, kwargs) File "/home/pi/crumpb/.conda/envs/st2/lib/python3.5/site-packages/click/core.py", line 697, in main rv = self.invoke(ctx) File "/home/pi/crumpb/.conda/envs/st2/lib/python3.5/site-packages/click/core.py", line 1066, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/pi/crumpb/.conda/envs/st2/lib/python3.5/site-packages/click/core.py", line 895, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/pi/crumpb/.conda/envs/st2/lib/python3.5/site-packages/click/core.py", line 535, in invoke return callback(args, **kwargs) File "/home/pi/crumpb/.conda/envs/st2/lib/python3.5/site-packages/sourcetracker/_cli/gibbs.py", line 214, in gibbs f(sample) File "/home/pi/crumpb/.conda/envs/st2/lib/python3.5/site-packages/sourcetracker/_sourcetracker.py", line 864, in _cli_loo_runner burnin, delay) File "/home/pi/crumpb/.conda/envs/st2/lib/python3.5/site-packages/sourcetracker/_sourcetracker.py", line 592, in gibbs_sampler new_e_idx = np.random.choice(source_indices, p=jp / jp_sum) File "mtrand.pyx", line 1144, in mtrand.RandomState.choice ValueError: probabilities are not non-negative

johnchase commented 5 years ago

@byroncrump I was able to reproduce the error you are receiving using the pypi distribution of sourcetracker, however I was not able to reproduce the error with the current code on github. We've done a fair amount of maintenance that is not currently in pypi, and we will probable be releasing a new pypi distribution in the near future.

In the meantime my recommendation would be to use the development version of sourcetracker2.

Here are the commands I used to have the job run successfully:

$ conda create -n st2 -c biocore python=3.5 numpy scipy scikit-bio biom-format h5py hdf5 seaborn
$ conda activate st2
$ pip install https://github.com/biota/sourcetracker2/archive/master.zip
$ sourcetracker2 gibbs -i table.txt -m map.txt -o loo/ --loo --source_rarefaction_depth 1700

If this does not fix the issue on your end, please let us know.

byroncrump commented 5 years ago

Thank you so much for your help with this. I followed your instructions except that:

  1. My system would not overwrite the old "st2" so I changed it to "st3"
  2. My system required me to use the command "source active st3" instead of "conda activate st3"

I looked at the results and noticed that it does not provide sample-by-sample assessments of how well samples fit with the source grouping. Can that information be exported?

johnchase commented 5 years ago

@byroncrump I am going to close this issue as the original issue appears to have been fixed.

In terms of the conda environments, conda will not overwrite an existing environment, so you would have to either delete the st2 environment or name the new one something else, as you did. The source activate vs conda activate has to do with using different versions of conda.

In terms of the output of sourcetracker I'm not sure I understand the issue. Will you post a new issue with example data, and the output that you are receiving vs the output that you would expect?

byroncrump commented 5 years ago

OK, I'll start a new issue. Thank you!

NeginValizadegan commented 3 years ago

Hi all,

I have the same issue with the leave one out strategy, the same error stated here. I can't follow the instructions here as I am using a cluster for speed. Do we know why this error occurs?

Thanks