caporaso-lab / sourcetracker2

SourceTracker2
BSD 3-Clause "New" or "Revised" License
65 stars 46 forks source link

Unexpected behavior with null values in api call #46

Closed johnchase closed 8 years ago

johnchase commented 8 years ago

The private api call _gibbs has unexpected behavior when null values are present in the source table.

For example

sink = pd.DataFrame([[1, 2], [1, 0]], index=['x', 'y'])
source = pd.DataFrame([[1, 2], [np.nan, np.nan]], index=['v', 'z'])

mix, std = _gibbs(source, sink)
mix

    v   z   Unknown
x   1.0 0.0 0.0
y   1.0 0.0 0.0

The simplest and probably easiest solution is to return an error if null values are present in the source or sink tables.

wdwvt1 commented 8 years ago

Now fixing this as part of my pd_internals branch.