aleixalcacer / archetypes

Scikit-learn compatible package for archetypal analysis.
https://archetypes.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
20 stars 6 forks source link

Potentional bug or is it possible? (identical alphas outcome) #10

Closed josefheidler closed 2 years ago

josefheidler commented 2 years ago

Hello, in attached files, I got two rows with identical archetype alpha equal to 1, it is possible (row 8 and 52 in results.csv)??

import pandas as pd
import archetypes as arch

data = pd.read_csv("input.csv", header=None).to_numpy()

n_archetypes = 4
n_init = 10
max_iter = 10000
aa = arch.AA(n_archetypes=n_archetypes, n_init=n_init, max_iter=max_iter)
aa_trans = aa.fit_transform(data)
pd.DataFrame(aa_trans).to_csv("results.csv")

Thanks!

input.csv results.csv

aleixalcacer commented 2 years ago

It is possible :) Two observations with the same aa_trans (alphas) means that in the "archetypal space" these two observations are equal to each other.

In addition, the alphas of these two observations [i.e. (0, 0, 1, 0)] are saying to you that these observations are equal to the archetype 3. Remember that you can get the archetypes with aa.archetypes_.

You can also use arch.simplex() to plot the alphas of your data to interpret them. See https://archetypes.readthedocs.io/en/latest/getting_started/examples/aa.html

josefheidler commented 2 years ago

So it is possible to have two guys with same alpha values [0, 0, 1, 0] but different input variables? I will try to test your package with package from R to see, if the numbers are the same or not.

Thank you!

aleixalcacer commented 2 years ago

Yes, it is. By its definition (see the original paper), alphas @ archetypes will be an apriximation of your dataset, so you lose information in the transformation.

In your case, the projection into the hull space determined by the archetypes is equal for both observations. If you are interested in this topic, I can pass you some papers to read.

Anyway, as it is a numerical algorithm, it is very likely that the results will not be the same on R (but let me know the results)

Aleix :)