caporaso-lab / sourcetracker2

SourceTracker2
BSD 3-Clause "New" or "Revised" License
62 stars 45 forks source link

Create SourceTracker2 api #31

Closed johnchase closed 8 years ago

johnchase commented 8 years ago

It would be really handy to have a SourceTracker2 api, something that could be run in a notebook or easily integrated into a larger analysis pipeline. I'm thinking it would look something like the following:

def sourcetracker(source_ids, sink_ids, table, std=False, cluster=None, loo=False, 
                  alpha1=0.001, alpha2=0.0, beta1=0.0, draws=0.0, 
                  burnin=0.0, delay=0.0)
  '''doc string''

 Parameters
    ----------
    source_ids : list
        Sample IDs that should be used as the source, must correspond to IDs in the OTU_table
    sink_ids : list
        Source IDs...
    table : pd.DataFrame
        Pandas dataframe object where columns are observations and rows are samples
    cluster : ipyparallel.Client
        user created parallel client object

    These rest of the parameters should corrospond to the CL call

Returns
    ----------
    mixing_proportions : pd.DataFrame
        A pandas DataFrame of the mixing proportions

# a bunch of code...
    return mixing_proportions

The overall workflow could look something like this:

$ ipcluster start -n 4
from sourcetracker2 import sourcetracker
import ipyparallel as ipp
c = ipp.Client()

mix_proportions_df = sourcetracker(source_ids, sink_ids, otu_table, cluster=c)
wdwvt1 commented 8 years ago

PR #32 addresses this, though it needs to be cleaned up significantly.

gregcaporaso commented 8 years ago

We should decide, as a whole, what functions we would like to be part of the public, stable API. We should then ensure that we have valid numpydoc for all of those functions. We should err on the side of a minimal API as it's easy to add new things to the API, but very hard to remove them. I agree that the _gibbs is almost there. You guys should work with this in the alpha release that I create to decide if it's what we want in the public API (which maybe we can try for in a beta release). I think this should possibly be the only function that's part of the public API, in which case maybe it should be renamed sourcetracker and take a mode parameter where you can pass gibbs (similar to skbio.diversity.beta_diversity and its metric parameter).

gregcaporaso commented 8 years ago

@wdwvt1 and I discussed also making the sourcetracker result comparison function (#57) accessible through the public API. Then the initial public API would be these imports:

from sourcetracker import sourcetracker, compare
gregcaporaso commented 8 years ago

Added in #71.