GEMDAT-repos / GEMDAT

Python toolkit for molecular dynamics analysis
https://gemdat.readthedocs.io
Apache License 2.0
24 stars 3 forks source link

Problem with occupancy function #339

Closed AstyLavrinenko closed 6 days ago

AstyLavrinenko commented 1 week ago

I'm trying to get working functions transitions.occupancy() and transitions.atom_locations() and getting an error: image

After fixing specie to species in the source code, I'm still getting error: image

Could you please fix that?

Besides that occupancy now seems to appear in the Structure object. Is there any additional function to extract average occupancy per each site type?

stefsmeets commented 1 week ago

Hi @AstyLavrinenko I cannot reproduce these errors. Which version of pymatgen are you using? And can you post the full traceback?

The choice to return a structure was made to have richer information available to the user. You can easily extract the average occupancy type per site type by looping over the structure.

You could for example use the composition and divide over the total number of sites:

struct = transitions.occupancy()
struct.composition
AstyLavrinenko commented 1 week ago

My pymatgen version is 2024.6.10

And full traceback:

AttributeError                            Traceback (most recent call last)
Cell In[31], line 1
----> 1 transitions.occupancy()

File \\tudelft.net\staff-umbrella\Project Data akl\gemdat_dev\GEMDAT\src\gemdat\transitions.py:255, in Transitions.occupancy(self)
    252 counts = counts / len(states)
    253 occupancies = dict(zip(unq, counts))
--> 255 species = [{site.species.name: occupancies.get(i, 0)} for i, site in enumerate(sites)]
    257 return Structure(
    258     lattice=sites.lattice,
    259     species=species,
   (...)
    262     labels=sites.labels,
    263 )

File \\tudelft.net\staff-umbrella\Project Data akl\gemdat_dev\GEMDAT\src\gemdat\transitions.py:255, in <listcomp>(.0)
    252 counts = counts / len(states)
    253 occupancies = dict(zip(unq, counts))
--> 255 species = [{site.species.name: occupancies.get(i, 0)} for i, site in enumerate(sites)]
    257 return Structure(
    258     lattice=sites.lattice,
    259     species=species,
   (...)
    262     labels=sites.labels,
    263 )

AttributeError: 'Composition' object has no attribute 'name'

Now I'm thinking that it could be because my structure to create transitions already have partial occupation as I took it from experiment image

Regarding the average occupancy, the functions you've proposed work only with one type of sites while usually I introduce at least 2-3 different site types. Maybe it worth introducing additional small function to get average occupations

stefsmeets commented 1 week ago

What would such an output look like? Feel free to open a separate issue for this.

stefsmeets commented 6 days ago

Okay, I can reproduce the error with a sites structure that has partial occupancy. Not sure if it can be 'fixed' though, because a partial site does not make sense to me.

We can either:

AstyLavrinenko commented 6 days ago

After all, we have three main things to fix:

    def average_occupancy(self):
        """Calculate average occupancy per a type of site.

        Returns
        -------
        dict[str, float]
            Return dict with the fraction of time atoms spent at a site
        """

        compositions_by_label = defaultdict(list)

        for site in self.occupancy():
            compositions_by_label[site.label].append(site.species.num_atoms)

        ret = {}

        for k, v in compositions_by_label.items():
            ret[k] = (sum(v) / len(v))

        return ret

Could you please do these fixes?

stefsmeets commented 6 days ago

Fair points, I have fixed these issues except for the first one. I think it is up to the user to clean their structure. There is no one way to make a disordered structure ordered, but the pymatgen code to do it yourself is trivial.

AstyLavrinenko commented 6 days ago

We don't need to convert the structure into an ordered one. We only need to account for all possible sites that the floating species might occupy, as this is simply a model of potential sites. In the simulation data, any inherent disorder will naturally appear through the computed occupancies and lithium densities. I think it's better for the code to automatically set the occupancy to 1 for all potential sites of the fluctuating species, rather than expecting the user to manually modify the structure. This approach minimizes user confusion and ensures consistent analysis.

stefsmeets commented 6 days ago

I see your point, I just don't see why gemdat needs to do this. This behaviour is difficult to define and to explain.

The code does not see the disorder, because it only uses the fractional coordinates and site labels. Any specie names are only used for descriptive purposes.

The behaviour you describe would sit behind some sort of flag or toggle, increasing complexity. This would then filter the input sites based on some pre-defined behaviour (floating specie or otherwise), but that is just passing it straight to pymatgen. Therefore I think it is better that the user performs this operation.