Closed gregcaporaso closed 7 years ago
@wdwvt1, @johnchase, @lkursell this is ready for review/merge whenever you guys are ready (I know you're working toward a deadline, so no rush).
@gregcaporaso - this looks very useful!
Two real comments
I think the _absolute_difference
function should sum the differences in each index. So rather than returning a per-source difference, it returns a total difference. In the test you have, I think it should be:
obs_m = [.5, .25, .25] exp_m = [.1, .8, .1] abs_diff = .4 + .65 + .15 = 1.2
There is no test for _validate_dataframe
.
One style question Do we need to document the private functions? Although users won't be accessing them directly, developers will (or we'll want to know in the future).
This partially addresses #57 and #31.
@wdwvt1, @johnchase, @lkursell this is an API to compare sourcetracker results. This could be used for the current benchmark and optimization projects as a way to determine how similar a pair of sourcetracker results are. I'm looking for input on the API before I spend too much time more time on it.
This defines two functions
compare_sinks
andcompare_sink_metrics
that become part of the public API as of the merge of this pull request (we'll commit to alpha-level stability of these in the next release - we're free to change these until then).compare_sinks
would take twopd.DataFrame
objects containing observed and expected mixing proportions and a metric to use for comparing the mixing proportions. It would return apd.DataFrame
containing metric-specific data on the similarity/difference of the mixing proportions.compare_sink_metrics
is a simple helper function that returns a list of the available metrics. This would be necessary for a QIIME 2 plugin, so interfaces can determine what the available choices are for themetric
parameter ofcompare_sinks
.