Open johannesbjork opened 4 years ago
Hi @johannesbjork,
Thank you for reporting this! The standalone CLI is the only tutorial I did not make and it seems that was an oversight on my part.
The error is occurring because the sample ids are labeled in the float format. So pandas are loading them as floats while biom is loading them as strings. This is causing the no sample ID matches between the table and metadata error seen above from gemelli.
I just fixed this in the tables here (fixed-IBD-example.zip) by adding a string ('s') to the sample names.
I will put in a PR for this fix and a standalone tutorial (issue #35).
The following command runs fine:
mkdir standalone-results
gemelli \
--in-biom fixed-IBD-example/table.biom\
--sample-metadata-file fixed-IBD-example/metadata.tsv \
--individual-id-column 'host_subject_id' \
--state-column-1 'timepoint' \
--output-dir standalone-results
But to save runtime (since this is an example) you could also remove singletons with the --min-feature-count
flag:
gemelli \
--in-biom fixed-IBD-example/table.biom\
--sample-metadata-file fixed-IBD-example/metadata.tsv \
--individual-id-column 'host_subject_id' \
--state-column-1 'timepoint' \
--min-feature-count 1\
--output-dir standalone-results
This also brings up a good point that a tutorial with R integration would be nice. I have added that to issue #35.
Thank you again for letting me know! and please let me know if this does not solve the problem for you.
Running the stand-alone version of
gemelli
on the example data used in the tutorial I get the errorValueError: No more features left. Check to make sure that the sample names between
sample-metadataand
tableare consistent
As I'm not a
Python
person, I filter the example data inR
.Having made sure that samples match between the feature table and the metadata (plus filtered the our rare stuff), I run
gemelli
and get the following error