PhasesResearchLab / pySIPFENN

Python python toolset for Structure-Informed Property and Feature Engineering with Neural Networks. It offers unique advantages through (1) effortless extensibility, (2) optimizations for ordered, dilute, and random atomic configurations, and (3) automated model tuning.
https://pysipfenn.org
Other
20 stars 3 forks source link

[ENH] Neat and automated transfer learning with OPTIMADE API for auto-adjusted problem-specific ML model generation on the fly #16

Closed amkrajewski closed 7 months ago

amkrajewski commented 7 months ago

As the title says, this new addition to the core pySIPFENN functionalities connects it to OPTIMADE API to enable rapid adjustment of the models to any specific dataset described by an OPTIMADE query (or multiple queries). Most of the functions are neatly hidden behind high-level API and default values should work well for datasets between 100-10,000 datapoints.

You can now simply:

from pysipfenn import Calculator, OPTIMADEAdjuster
c = Calculator(autoLoad=False)
c.loadModels("SIPFENN_Krajewski2022_NN30")
ma = OPTIMADEAdjuster(c, "SIPFENN_Krajewski2022_NN30",  useClearML=True, device='mps') # MPS is for Apple M1 GPU

ma.fetchAndFeturize(
    'elements HAS "Hf" AND elements HAS "Mo" AND NOT elements HAS ANY "O","C","F","Cl","S"',
    parallelWorkers=4)
ma.adjust()

ma.plotStarting() # See the starting performance
ma.plotAdjusted() # See the adjusted performance

or to perform a hyperparameter search, replace the ma.adjust() with:

ma.matrixHyperParameterSearch()
ma.adjust(learningRate=0.0001, optimizer='AdamW', weightDecay=1e-05, epochs=37)

All model usage works as before with the Calculator class. Modifying or exporting it for later is through specific classes in the modelExporters submodule.

amkrajewski commented 7 months ago

Notes:

  1. It is feature-complete.
  2. I'm still working on the testing suite.
  3. @rdamaral will add a neat tutorial at a future date.
codecov[bot] commented 7 months ago

Codecov Report

Attention: Patch coverage is 89.20188% with 46 lines in your changes are missing coverage. Please review.

Project coverage is 93.58%. Comparing base (74a31dd) to head (4c12c21). Report is 4 commits behind head on main.

Files Patch % Lines
pysipfenn/core/modelAdjusters.py 87.15% 46 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #16 +/- ## ========================================== - Coverage 94.84% 93.58% -1.27% ========================================== Files 17 19 +2 Lines 1999 2432 +433 ========================================== + Hits 1896 2276 +380 - Misses 103 156 +53 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

amkrajewski commented 7 months ago

Hi @jwsiegel2510 and @rdamaral Everything is complete and the tests are passing. It's ready to be reviewed!

amkrajewski commented 7 months ago

Hi @jwsiegel2510 and @rdamaral, I was hoping to pull it later today to align with the manuscript posting on arXiv.

ricardonpa commented 7 months ago

Hi Adam,

I've reviewed the documentation, tested the main functions, and they are working well. I also did not encounter any issues when installing this branch version in a new conda environment (Python 3.10).

Just a couple of comments:

amkrajewski commented 7 months ago

Hi @rdamaral ! Thanks for the insightful comments :)

  1. The default number of epochs for fine-tuning was reduced to 20, with documentation discussing this and mentioning that on a GPU (even a laptop one) 100 may be preferred.
  2. The OQMD server is down, and JARVIS seems to have issues filtering. I will ask about that at the developer meeting tomorrow.
  3. I've added a bunch of assertions that should catch unexpected user inputs and display useful messages on what went wrong. The property data paths are provider-specific and cannot be inferred a prior.
amkrajewski commented 7 months ago

I also added a new functionality that allows you to override provider and use a custom endpoint. E.g.

ma = pysipfenn.OPTIMADEAdjuster(
    c,
    model="SIPFENN_Krajewski2022_NN30",
    endpointOverride=["https://alexandria.icams.rub.de/pbesol"],
    targetPath=['attributes', '_alexandria_formation_energy_per_atom']
)

ma.fetchAndFeturize(
    'elements HAS "Hf" AND elements HAS "Mo" AND elements HAS "Zr"',
    parallelWorkers=2
)
ricardonpa commented 7 months ago

Nice. The endpointOverride input is very interesting from the user’s perspective, especially in the event of changes or new additions to OPTIMADE. 👍