Closed alexsquires closed 1 year ago
Very speedy. I will look at the code in detail tomorrow/Monday. I am not sure familiar with ase (or pymatgen really) but is there a difference in the file types that can be read by them?
Not really. I suppose philosophically, Pymatgen was built from VASP outwards, whereas the mission of ASE has been to plug the gaps between all the different materials modelling software. I would expect they have a lot of overlap, but there may be some file types that ASE is better placed to handle.
What I'm referring to here is that the code I have implemented here is the analog of DiffusionAnalyzer.from_pymatgen()
rather than from_Xdatcar
or from_file
, which is perhaps what I initially implied I am after. Actually it was directly parsing the Atoms
objects that was the main point.
So additional from_file
/from_ase_trajectory
functionality would want to sit in a non-confusing way alongside the already implemented from_file
functionality
I think the sensible way is from_file
becomes a convenience method, which either tries to detect the file type or allows the user to specify it. And this then hands off to dedicated from_xdatcar
etc. methods. Which allows for clean extensibility in the future.
So the question I was getting to was, if Pymatgen can handle all the same file types as ASE is there a need for a from_file
that uses the ASE parser or would the Pymatgen one be suitable?
This discussion is making me realise that there is perhaps too much complexity in having from_file
as a class method. Maybe we should force users to convert their files to Pymatgen/ASE/MDAnalysis objects before reading them in.
I think you're right. I think you only need to pymatgen based one because as noted dealing with the pymatgen Structures is quicker, because everything you need (fractional coordinates) is precomputed.
So you only really need to deal with ASE when you have a list of ase Atoms, and don't want to convert to pmg Structures.
On Sun, 4 Jun 2023, 08:45 Andrew McCluskey, @.***> wrote:
So the question I was getting to was, if Pymatgen can handle all the same file types as ASE is there a need for a from_file that uses the ASE parser or would the Pymatgen one be suitable?
This discussion is making me realise that there is perhaps too much complexity in having from_file as a class method. Maybe we should force users to convert their files to Pymatgen/ASE/MDAnalysis objects before reading them in.
— Reply to this email directly, view it on GitHub https://github.com/bjmorgan/kinisi/pull/32#issuecomment-1575448856, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFMK74SG4TYT3STMJSK3UBDXJQ4I7ANCNFSM6AAAAAAYZH2BRA . You are receiving this because you authored the thread.Message ID: @.***>
Thoughts (as a user) on “sunsetting” the from_file
method entirely?
Cause my view is that of a developer looking to simplify their code base wherever possible.
Might be worth surveying @bjmorgan's PhD students to see what they use. My experience of using kinisi is mostly trying to fix things for others when they can't get it to work (which is exclusively down to non-familiarity with python, not kinisi)
Why not from_file()
as a convenience method that then just calls the respective from_fmt()
methods?
I thought about this some more and think we should just have explicit from_filetype
methods. It simplifies the code, which reduces possible bugs, and it makes user code explicit instead of implicit, which also reduces possible bugs.
Filetypes? That would mean writing a different method for every file type, massively increasing code complexity.
Filetypes? That would mean writing a different method for every file type, massively increasing code complexity.
How so? It’s the same code but you avoid wrapping everything in a set of if
statements. And shared code is pulled out into shared private methods. And it makes testing easier.
I must be misunderstanding your suggestion. Cause, to me, you are saying we should have from_vasp
, from_gromacs
, from_lammps
, from_quantumespresso
which means a lot more methods. The if
statement complexity currently present in kinisi
wouldn't go away with this solution (cause it is to account for the type
option).
@alexsquires the code looks good to me btw, if you get some unit tests behind it, we can merge.
I must be misunderstanding your suggestion. Cause, to me, you are saying we should have
from_vasp
,from_gromacs
,from_lammps
,from_quantumespresso
which means a lot more methods. Theif
statement complexity currently present inkinisi
wouldn't go away with this solution (cause it is to account for thetype
option).
My suggestion is methods called from_xdatcar
etc. and any logic specific to those file types is kept in those methods. Rather than an omnivorous from
method that has internal logic for all valid filetypes.
Right, that's similar to what I am suggesting (and how it is currently implemented), but I am saying we stop accepting files and instead only objects. So there would be from_ase
(which takes ase.atoms.Atoms
), from_pymatgen
(which takes pymatgen.core.structure.Structure
) and from_universe
(which takes MDAnalysis.core.universe.Universe
). Then it would be up to the user to get their files into this type of object.
I misunderstood then, because e.g. Xdatcar is both an object and a file format. And I think “should kinisi read files or not” is a separate design decision to the one I thought we were discussing.
@alexsquires the code looks good to me btw, if you get some unit tests behind it, we can merge.
Great. I have a couple of boring train journeys coming up. I can probably knock these up then. Thanks for reviewing.
A quick scan all looks good. Let's see what the tests say...
Oh, gotta add 'ase'
here (https://github.com/alexsquires/kinisi/blob/84d4d64f2b895a39aa7f77b6024e7554c0a9d115/pyproject.toml#L42)
Hmm this might be some dodgy CI that I should fix. I think the problem now is about privileges but I I’ll investigate properly soon.
Hey @alexsquires, can you try pulling from master. I think I have sorted the CI thing.
Just realising that my "fix" might not work cause it assumes you haven't named your branch master 😢. Sorry that was stupid by me.
Yeah, sorry about that. Can you try merging again?
Awesome, looks like everything is passing nicely. Final thing, can you add yourself to the CITATION.cff
as an author of the software and we will get it merged (commit with the [skip ci]
tag cause you shouldn't be touch any real code).
Woohoo! Thanks for the cracking contribution @alexsquires 👏 🎉 🚀
Here's my draft.
Just as an initial jumping off point. I ran some user-testing (unit testing to follow if you're generally happy with my contributions). While the ASE trajectory reading is slightly slower than pymatgen (the simple explanation for this is a need to calculate the fractional coordinates from the trajectory, similar to what you appear to have to do for MDAnalysis), the fact you don't have to convert the structures is a real game changer. For a 10000 frame trajectory, it took
The big "design" question is how you want to handle the reading the trajectory from a file? Generalise the
.from_file()
methods, or split them into discrete pymatgen vs. ase?(Also like to say this was so straightfoward to do, what a dreamy code base to work with)