[Feature Request]: Quick class method to get cluster count and cluster composition

Jeff-oakley commented 1 year ago

Email (Optional)

bouyang@fsu.edu

Problem

Can we have some class method to get cluster count (under indicator basis) and cluster composition more easily? Or can I double check if the code below is correct to get how many clusters we have?

# Assuming cs is a defined cluster subspace

feature_multiplicity = [1] # First cluster is empty cluster

for orbit in cs.orbits:
    feature_multiplicity.extend([orbit.multiplicity for _ in range(len(orbit.bit_combos))])

feature_multiplicity.append(1) # Last cluster is 1/dielectric constant

Proposed Solution

Add two class method? For the cluster count, maybe just use my code above or the corrected version if my code is not correct.

For the cluster composition, need a bit work on mapping bit_combos into a string? Is there a class attribute to map the indices in bit_combos into certain specie?

Alternatives

No response

Code of Conduct

[X] I agree to follow this project's Code of Conduct

kamronald commented 1 year ago

Hi Bin,

If you do cs.num_corr_functions, that should get what you're looking for. I'm assuming by "cluster" you mean correlation function. If you mean actual geometric clusters of sites, then cs.num_clusters is also a function.

For your second point, we do not have this feature yet. To get the relationship between bit_combos and a species (only in indicator basis) I think the best option right now is to print out a site space: cs.orbits[0].site_bases[0].site_space or from smol.cofe.space.domain import get_site_spaces get_site_spaces(cs.structure) and the bit combo should correspond to the index of the species listed in the site_space. I think it would be a good idea to add the feature you mention.

Jeff-oakley commented 1 year ago

Thanks Ronald. cs.num_corr_functions will only print the dimensionality of feature matrix. I am talking about how to obtain sth like: "how many Mn-Mn dimers we have with geometry defined in a specific orbit". This is essentially the value of structuralwrangle.feature_matrix, but has to be scaled up in a way. If you check the code I provided, this is the way I think how feature_matrix can be scaled up to reflect number of clusters for each orbit with certain specie decorator. Is that correct?

kamronald commented 1 year ago

Thanks for clarifying, I think I understand your question better now. So you want to multiply a value of your correlation vector in the feature matrix by a certain value (N) to obtain a concentration, and you are trying to obtain N? If that is so, I think your for loop should be changed to:

for orbit in cs.orbits:
    feature_multiplicity.extend([orbit.multiplicity * len(arr) for arr in orbit.bit_combos])

Your code as written would only multiply the matrix element by the orbit multiplicity. However the composition of a cluster decoration may be degenerate by value len(arr), so you should multiply by that degeneracy as well.

kamronald commented 1 year ago

Actually thinking about it again, @Jeff-oakley I think you were right the first time. That extra multiplicity I mentioned shows up in the orbit but not in the correlation function, I believe.

lbluque commented 12 months ago

Hi @Jeff-oakley and @kamronald,

Obtaining the correlation function multiplicities should be implemented in the cs.function_ordering_multiplicities property.

As you mention, the only way to obtain the total number of specific cluster occupations (such as your example Mn-Mn dimers) right now is to use a cs with an indicator basis (assuming the occupation you want is included). To do so, you simply need to add the normalized=False to cs.corr_from_structure, or equivalently if using a ClusterExpansionProcessor should already be computing the extensive value. I would double check if I am not off by a multiplicity factor somewhere....

For the case of other basis functions, I have code that is not fully tested to obtain the transformation matrix needed to compute cluster counts from correlation vectors, but I have not had the time to clean it up and fully test it. However I would be happy to push it to a dev branch in case you are interested.

Jeff-oakley commented 12 months ago

Thanks Ronald the Luis! I will look into that.

The indicator basis should be good enough for now but it would be great to have the cluster count for other basis as well:)

Jeff-oakley commented 8 months ago

Hi @Jeff-oakley and @kamronald,

Obtaining the correlation function multiplicities should be implemented in the cs.function_ordering_multiplicities property.

As you mention, the only way to obtain the total number of specific cluster occupations (such as your example Mn-Mn dimers), efficiently right now is to use a cs with an indicator basis (assuming the occupation you want is included). To do you simply need to add the normalized=False to cs.corr_from_structure, or equivalently if using a ClusterExpansionProcessor this should already be the computed extensive value. I would double check if I am not of by a multiplicity factor somewhere....

For the case of other basis functions, I have code that is not fully tested to obtain the transformation matrix needed to compute cluster counts from correlation vectors but I have not had the time to clean it up and fully test it. However I would be happy to push it to a dev branch in case you are interested.

Hi Luis - I tried what you recommended. However I obtain fractional amount of clusters. Why we have 0.08333333333333333 number of cluster? Or maybe the cluster counting is not correct? I prepared one example as attached below github_debug.zip

CederGroupHub / smol