Open materialsguy opened 2 years ago
Hi @materialsguy!
This is an excellent topic. Some time ago I saw something similar in matminer, where you can call feature_labels()
to get some kind of information about the features. I do have this as one of the TODO's in our kanban, but as of now, it is not directly possible.
In practice implementing it should be fairly straightforward, but I cannot give any timeline on this. It is possible to reverse-engineer some of the label information by using the get_location()
-method, which gives the slice for the given species-pair. But this does not currently support getting the location of specific (l, n)-values.
Thank you for the quick reply. I also think such an implementation would really help from a machine learning feature engineering & feature analysis perspective, especially when the analysis is done by somebody that has not the full knowledge about the feature vectors themselves from a physical point of view. Please let me know when you implemented it.
I will have a look at the get_location()
-method.
Thanks.
Hello,
I'm currently analysing a machine learning model of somebody else, that is trained using soap feature vectors. The code generating the feature vector looks something like that:
soap = SOAP(species=species, periodic=True, rcut=2.5, nmax=8, lmax=8, average="inner", sparse=False) feature_vectors = soap.create(atoms, n_jobs=1)
Where
species
is a set that holds the different element names andatoms
is a list containing Atom typed elements like:Atoms(symbols='O18Al12', pbc=True, cell=[[4.76, 0.0, 0.0], [-2.379999999999999, 4.122280922013928, 0.0], [0.0, 0.0, 12.993]], spacegroup_kinds=...)
. Thefeature_vectors
are then transformed into a rather big pd.dataframe that contains 1109304 columns.Is there a way to find out the feature names (physical meaning) of the single values of a feature_vector? For me currently it is "just" a row in a dataframe which the model then is based on without any column descriptions. For my analysis it would be interesting to know which column is representing what in a physical way since my analysis results in some kind of feature importance of the respective column.
Thank you very much.
Best regards,
Claus