[Feature Request]: Example Notebook for MPtrj cleaning

CompRhys commented 7 months ago

Email (Optional)

No response

Problem

It would be great it we could have an example notebook showing the MPtrj query pattern and cleaning. It worth noting that the MP query would need to be pinned to the v2021.11.10 to arrive at the same dataset as current MPtrj but having the notebook would enable users to recreate similar datasets for newer releases like v2023.11.1 where a large number of materials have both been added and deprecated.

Proposed Solution

Notebook should by default have a smoke_test version that would only perform the cleaning on a smaller query.

Alternatives

No response

Code of Conduct

[X] I agree to follow this project's Code of Conduct

BowenD-UCB commented 7 months ago

The original code won't work today due to

Materials Project MPRester syntax change.
Materials Project thermodoc changed, a lot of website entries are r2SCAN now.

Instead, I will state an outline of the MPtrj parsing process:

Query all the exisiting mp_ids

Query all the mp_ids from the mp_ids with

task_types = ['GGA Static', 'GGA Structure Optimization', 'GGA+U Static', 'GGA+U Structure Optimization']

For each task queried from mpr.tasks.get_data_by_id, check their calculation compatibility with the associated thermodoc entry queried from mpr.get_entry_by_material_id This includes:
- INCAR setting checks
- electronic step convergence
- ionic step energies can not be lower than 10 meV/atom or higher than 1eV/atom compared to thermodoc entry
For trajectory frames that passed step 3, use pymatgen StructureMatcher, to ensure frame similarities are low.

CompRhys commented 7 months ago

Understood re the API calls, still believe that it would be great to share a programatic example capturing the screening process (steps 3/4) particularly so that people can extend it with additional rules or recreate something similar on things like OQMD. Code is the fundamental way we ensure our work is reproducible.

BowenD-UCB commented 7 months ago

This is further addressed in here

CederGroupHub / chgnet