Open azadoks opened 2 years ago
Some thoughts:
Add support for psp8 (and previous versions?) -- package could need a re-naming
Could be a different package? Or do you want to make a generic pseudo parser? We could add our HGH code from DFTK as well?
On that note, I wonder whether it would make sense to actually make this package a generic pseudopotential datastructure implementation (similar to the generic NormConservingPseudo
interface we have in DFTK) that also does the parsing?
Unitful for this use case I'm not so sure. I would say this is pretty internal, so better just stick to atomic units everywhere
Could be a different package? Or do you want to make a generic pseudo parser? We could add our HGH code from DFTK as well? On that note, I wonder whether it would make sense to actually make this package a generic pseudopotential datastructure implementation (similar to the generic NormConservingPseudo interface we have in DFTK) that also does the parsing?
The question was more towards the second point. I agree that if we can iron out the details for a fairly generic datastructure implementation, it could be useful to have a library for supporting various pseudo formats (like libpspio but readable).
Given how things are done generally in the community, having a PseudopotentaialsBase.jl
and then UPF.jl
and PSP.jl
could also be reasonable (but then the number of packages would be a bit bigger with the XML PAW format that PseudoDojo/Abinit uses, PSML, and HGH that's already in DFTK).
Unitful for this use case I'm not so sure. I would say this is pretty internal, so better just stick to atomic units everywhere
Yeah, this seems reasonable. I should definitely document all the units that are used.
w.r.t. datastructures, I'm a bit stuck on how best to proceed. Various formats support different kinds of information that don't always completely line up; one could massage each separately to get a more homogenous representation while giving up a more direct mapping to the file contents. On the other hand, having a separate type for each fileformat would mean better correspondence with how things are laid out in the files, but we'd need to rely much more on a well-defined interface (which wouldn't be all bad).
Does anyone have thoughts on which option might be better / another proposal entirely?
On the other hand, having a separate type for each fileformat would mean better correspondence with how things are laid out in the files, but we'd need to rely much more on a well-defined interface (which wouldn't be all bad).
Certainly the way I would go. In DFTK we currently have a mixture mainly for pragmatic reasons that it takes less time than designing the interface properly. But then it's also hard to generalise from the one example that we currently have in the code. If we aim broader here I think there is chances of success.
My suggestion would be to not worry about this top down, but rather implement a few parsers bottom-up and then try to find the common features and turn them into a function-based and types-based interface.
Happy to discuss by the way also via zoom etc. I'm still travelling this week, but next week should be better.
Over in #7 I've implemented much more thorough parsers for UPF and PSP8, both of which produce struct
s for all the myriad fields in the files. I've tried to keep the structs mostly one-to-one with the file contents because it's much easier to find errors, add new sections, etc.
The parsers themselves are fairly thoroughly tested now, but I'm trying to write tests for the functional interface for common quantities, and I'm running into really poor performance and a fair bit of code duplication.
It seems a bit tricky to balance the two design goals I set:
The problem is that, for the two formats that I've parsed so far, the quantities in-file are either subtly different (projectors * r vs true projectors in UPF vs PSP8) or completely different (numeric vs. analytical forms in UPF vs HGH).
Subtle differences like in PSP8 vs UPF mean that a majority of the code must be duplicated and slightly tweaked for the two formats if we keep the struct-file correspondence from point 1.
Big differences like in HGH vs UPF/PSP8, combined with restrictions from point 1, mean that all pseudos should implement functions like local_potential_real(psp, r)
which are cheap to evaluate for HGH but expensive for UPF/PSP8 because an interpolator is reconstructed each time the function is called and shouldn't be stashed with the data.
One possible solution I've been thinking about is to split the data representations in two levels: FormatFile
(e.g. UpfFile
) structs which are one-to-one representations of the data in-file and FormalismPsP
(e.g. NNCPsP
-- numeric norm-conserving, ANCPsP
-- analytical norm-conserving). This has a few benefits:
*PsP
structsAnd a few drawbacks:
*File
-> *PsP
constructors)AbstractNCPsP
: abstract norm-conservingNNCPsP
: numeric norm-conservingANCPsP
: analytical norm-conservingUSPsP
: ultrasoft (only numeric)PAWPsP
: PAW (only numeric)SLPsP
: semilocal (only numeric)CoulombPsP
: coulomb (only numeric)Subtle differences like in PSP8 vs UPF mean that a majority of the code must be duplicated and slightly tweaked for the two formats if we keep the struct-file correspondence from point 1.
On that point: I think there is little you can do about code duplication in this case without making it super difficult to understand what is happening. Best do document which parts of the duplicated codes have similar versions in other files. That ensures that if you later refactor you don't forget to do it elsewhere
For the moment, I've basically translated Simon Pintarelli's
upf_to_json
Python package with testing to ensure numerical agreement.Some rough to-do's are:
FileIO
integration (we have magic bytes, why not)save
?