ASE parsing - Githubissues

alexsquires commented 1 year ago

Here's my draft.

Just as an initial jumping off point. I ran some user-testing (unit testing to follow if you're generally happy with my contributions). While the ASE trajectory reading is slightly slower than pymatgen (the simple explanation for this is a need to calculate the fractional coordinates from the trajectory, similar to what you appear to have to do for MDAnalysis), the fact you don't have to convert the structures is a real game changer. For a 10000 frame trajectory, it took

2s to parse the ASE trajectory
< 1s to parse the pymatgen trajectory
but it took 22s to convert the pymatgen Structures to ase Atoms

The big "design" question is how you want to handle the reading the trajectory from a file? Generalise the .from_file() methods, or split them into discrete pymatgen vs. ase?

(Also like to say this was so straightfoward to do, what a dreamy code base to work with)

arm61 commented 1 year ago

Very speedy. I will look at the code in detail tomorrow/Monday. I am not sure familiar with ase (or pymatgen really) but is there a difference in the file types that can be read by them?

alexsquires commented 1 year ago

Not really. I suppose philosophically, Pymatgen was built from VASP outwards, whereas the mission of ASE has been to plug the gaps between all the different materials modelling software. I would expect they have a lot of overlap, but there may be some file types that ASE is better placed to handle.

What I'm referring to here is that the code I have implemented here is the analog of DiffusionAnalyzer.from_pymatgen() rather than from_Xdatcar or from_file, which is perhaps what I initially implied I am after. Actually it was directly parsing the Atoms objects that was the main point.

So additional from_file/from_ase_trajectory functionality would want to sit in a non-confusing way alongside the already implemented from_file functionality

bjmorgan commented 1 year ago

I think the sensible way is from_file becomes a convenience method, which either tries to detect the file type or allows the user to specify it. And this then hands off to dedicated from_xdatcar etc. methods. Which allows for clean extensibility in the future.

arm61 commented 1 year ago

So the question I was getting to was, if Pymatgen can handle all the same file types as ASE is there a need for a from_file that uses the ASE parser or would the Pymatgen one be suitable?

This discussion is making me realise that there is perhaps too much complexity in having from_file as a class method. Maybe we should force users to convert their files to Pymatgen/ASE/MDAnalysis objects before reading them in.

alexsquires commented 1 year ago

I think you're right. I think you only need to pymatgen based one because as noted dealing with the pymatgen Structures is quicker, because everything you need (fractional coordinates) is precomputed.

So you only really need to deal with ASE when you have a list of ase Atoms, and don't want to convert to pmg Structures.

On Sun, 4 Jun 2023, 08:45 Andrew McCluskey, @.***> wrote:

So the question I was getting to was, if Pymatgen can handle all the same file types as ASE is there a need for a from_file that uses the ASE parser or would the Pymatgen one be suitable?

This discussion is making me realise that there is perhaps too much complexity in having from_file as a class method. Maybe we should force users to convert their files to Pymatgen/ASE/MDAnalysis objects before reading them in.

— Reply to this email directly, view it on GitHub https://github.com/bjmorgan/kinisi/pull/32#issuecomment-1575448856, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFMK74SG4TYT3STMJSK3UBDXJQ4I7ANCNFSM6AAAAAAYZH2BRA . You are receiving this because you authored the thread.Message ID: @.***>

arm61 commented 1 year ago

Thoughts (as a user) on “sunsetting” the from_file method entirely?

arm61 commented 1 year ago

Cause my view is that of a developer looking to simplify their code base wherever possible.

alexsquires commented 1 year ago

Might be worth surveying @bjmorgan's PhD students to see what they use. My experience of using kinisi is mostly trying to fix things for others when they can't get it to work (which is exclusively down to non-familiarity with python, not kinisi)

bjmorgan commented 1 year ago

Why not from_file() as a convenience method that then just calls the respective from_fmt() methods?

bjmorgan commented 1 year ago

I thought about this some more and think we should just have explicit from_filetype methods. It simplifies the code, which reduces possible bugs, and it makes user code explicit instead of implicit, which also reduces possible bugs.

arm61 commented 1 year ago

Filetypes? That would mean writing a different method for every file type, massively increasing code complexity.

bjmorgan commented 1 year ago

Filetypes? That would mean writing a different method for every file type, massively increasing code complexity.

How so? It’s the same code but you avoid wrapping everything in a set of if statements. And shared code is pulled out into shared private methods. And it makes testing easier.

arm61 commented 1 year ago

I must be misunderstanding your suggestion. Cause, to me, you are saying we should have from_vasp, from_gromacs, from_lammps, from_quantumespresso which means a lot more methods. The if statement complexity currently present in kinisi wouldn't go away with this solution (cause it is to account for the type option).

arm61 commented 1 year ago

@alexsquires the code looks good to me btw, if you get some unit tests behind it, we can merge.

bjmorgan commented 1 year ago

I must be misunderstanding your suggestion. Cause, to me, you are saying we should have from_vasp, from_gromacs, from_lammps, from_quantumespresso which means a lot more methods. The if statement complexity currently present in kinisi wouldn't go away with this solution (cause it is to account for the type option).

My suggestion is methods called from_xdatcar etc. and any logic specific to those file types is kept in those methods. Rather than an omnivorous from method that has internal logic for all valid filetypes.

arm61 commented 1 year ago

Right, that's similar to what I am suggesting (and how it is currently implemented), but I am saying we stop accepting files and instead only objects. So there would be from_ase (which takes ase.atoms.Atoms), from_pymatgen (which takes pymatgen.core.structure.Structure) and from_universe (which takes MDAnalysis.core.universe.Universe). Then it would be up to the user to get their files into this type of object.

bjmorgan commented 1 year ago

I misunderstood then, because e.g. Xdatcar is both an object and a file format. And I think “should kinisi read files or not” is a separate design decision to the one I thought we were discussing.

alexsquires commented 1 year ago

@alexsquires the code looks good to me btw, if you get some unit tests behind it, we can merge.

Great. I have a couple of boring train journeys coming up. I can probably knock these up then. Thanks for reviewing.

arm61 commented 1 year ago

A quick scan all looks good. Let's see what the tests say...

arm61 commented 1 year ago

Oh, gotta add 'ase' here (https://github.com/alexsquires/kinisi/blob/84d4d64f2b895a39aa7f77b6024e7554c0a9d115/pyproject.toml#L42)

arm61 commented 1 year ago

Hmm this might be some dodgy CI that I should fix. I think the problem now is about privileges but I I’ll investigate properly soon.

arm61 commented 1 year ago

Hey @alexsquires, can you try pulling from master. I think I have sorted the CI thing.

arm61 commented 1 year ago

Just realising that my "fix" might not work cause it assumes you haven't named your branch master 😢. Sorry that was stupid by me.

arm61 commented 1 year ago

Yeah, sorry about that. Can you try merging again?

arm61 commented 1 year ago

Awesome, looks like everything is passing nicely. Final thing, can you add yourself to the CITATION.cff as an author of the software and we will get it merged (commit with the [skip ci] tag cause you shouldn't be touch any real code).

arm61 commented 1 year ago

Woohoo! Thanks for the cracking contribution @alexsquires 👏 🎉 🚀

bjmorgan / kinisi

ASE parsing #32