caporaso-lab / sourcetracker2

SourceTracker2
BSD 3-Clause "New" or "Revised" License
61 stars 45 forks source link

KeyError: 'SourceSink' #109

Closed rebelwebster closed 5 years ago

rebelwebster commented 5 years ago

Hi

I am a novice to this so please be nice!! I've ran the test data without a problem and I am sure it's to do with the formatting of my files. I don't have a biom output so OTU table is in tab delimited format which I assume is ok as it appears in your sample files.

I run the following command:

sourcetracker2 gibbs -i OTU_Table_ST.txt -m metadata.txt -o OUTPUT/

And the traceback is:

  File "/home/ampere/rlayton/miniconda3/envs/st2/bin/sourcetracker2", line 11, in <module>
    sys.exit(cli())
  File "/home/ampere/rlayton/miniconda3/envs/st2/lib/python3.5/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/ampere/rlayton/miniconda3/envs/st2/lib/python3.5/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/ampere/rlayton/miniconda3/envs/st2/lib/python3.5/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/ampere/rlayton/miniconda3/envs/st2/lib/python3.5/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ampere/rlayton/miniconda3/envs/st2/lib/python3.5/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/ampere/rlayton/miniconda3/envs/st2/lib/python3.5/site-packages/sourcetracker/_cli/gibbs.py", line 163, in gibbs
    source_value=source_column_value, sink_value=sink_column_value)
  File "/home/ampere/rlayton/miniconda3/envs/st2/lib/python3.5/site-packages/sourcetracker/_sourcetracker.py", line 654, in sinks_and_sources
    if md[column_header] == sink_value:
KeyError: 'SourceSink'

Thanks

johnchase commented 5 years ago

Hi @rebelwebster,

Without seeing the actual data, I have a guess as to what is going on. The gibbs command requires that a column is present indicating if a sample is a source or a sink. It you don't specify the name of the column in the command it defaults to SourceSink.

if you don't have a column named SourceSink you can either change the metadata file header or specify the name of the column that describes the sources and sinks. For example if your column is named SourcesAndSinks

sourcetracker2 gibbs -i OTU_Table_ST.txt -m metadata.txt -o OUTPUT -source_sink_column SourcesAndSinks

Hopefully this will solve the issue, but if not could you post a small example of your data? Thanks

rebelwebster commented 5 years ago

Hi @johnchase

Thanks for the response.

Hmm, strange. Attached is a clipped version of my OTU table and the mapping (metadata.txt) file. I think both are in order...

metadata.txt sample_OTU.txt

Thanks

R

johnchase commented 5 years ago

Thanks for including a sample of your data. This is odd, I am not able to reproduce the error you are seeing. What version of Pandas are you using? (You can get this by entering pip list into the command line)

I did have to remove one of the rows of data from the sample_OTU.txt file as it extended past the column headers, otherwise as far as I can tell my command was identical to yours.

sourcetracker2 gibbs -i test_data.txt -m metadata.txt -o OUTPUT/ test_data.txt

johnchase commented 5 years ago

I should have asked this initially, but what version of sourcetracker2 are you using?

rebelwebster commented 5 years ago

I tried running this again with the same files and it produced the same error. Then crudely, I downloaded the example files and copy/pasted my data into those files. I re-uploaded the files and ran them again and everything works.

I am sure it is user error and I have made a mistake somewhere along the line but at least it is working now!

Thanks for your time :)