ArnaoutLab / diversity

Partitioned frequency- and similarity-sensitive diversity in Python
MIT License
6 stars 1 forks source link

Pivot table refactor #45

Closed Elliot-D-Hill closed 1 year ago

Elliot-D-Hill commented 2 years ago

A pivot table is not a diversity calculation, so it feels strange to have a custom pivot table function in the package. Proposal: use pandas, but set species column as categorical ordering. This will have the added benefit of removing the need for unique_correspondence (and other stuff). See example replacement code below:

counts["subcommunity"] = counts["subcommunity"].astype('category') species_ordering = pd.CategoricalDtype(categories=similarity.columns) counts["species"] = counts["species"].astype(species_ordering) table = pd.pivot_table(counts, columns="subcommunity", index="species", values="count", fill_value=0)

sharedArray pseudo code

shared_counts[:,:] = counts.pivot_table(...).to_numpy() # maybe don't need to_numpy()