SENPAI-Molecular-Dynamics / SENPAI

Molecular dynamics simulation software
https://senpaimd.org
GNU General Public License v3.0
125 stars 16 forks source link

Formalize the input format for substrates/solvents #25

Closed Chelsea486MHz closed 2 years ago

Chelsea486MHz commented 3 years ago

Specs for the MDS format are available in Documentation/mds-format.txt.

The input file format should be formalized.

guygastineau commented 3 years ago

Your documentation files about this already practically describe a grammar. From that work it won't be hard to "formalize" the grammar. I used scare quotes, because syntax for formalizing BNF grammar is itself not standardized across the wild lol.

One question I have: The last two lists have their length (in lines) determined by two values from a previous line; should we formalize this relationship in the grammar or should we leave that assurance as post-parsing verification step?

My suggestion is to formalize the grammar of your files as a context free grammar, which will mean such verification will happen after parsing. At least I don't think there is a way to define that verification into the grammar in a context free way.

Anyway, I can write such a grammar pretty easily (I'll try to get myself moving tonight). I will also check research if anyone has made any extensions to BNF to allow for context dependant grammars to be expressed easily as I suppose formalizing the relationships for (atom count, atom list length) and (bond count, bond list length) in the grammar could be nice.

Chelsea486MHz commented 3 years ago

I think the post-parsing verification step is useless. In my opinion it's up to the user to correctly verify the input data, following the "garbage in, garbage out" mindset in computer simulations.

I'll be waiting for your suggestions! Feel free to edit the documentation files should you come up with modifications.

guygastineau commented 3 years ago

Rereading those docs I think the atom list and bond list make a graph where the atom list is the set of nodes/objects and the bond list is the set of edges/relationships between nodes using 1 based indexing. Typically I would view verification of the node indices in the relationships as a separate concern from a formalization of the grammar, but I want to make sure that you aren't expecting the grammar to formalize such a verification. For the record, I don't mind writing a set theoretical proof to use as an invariant to include with the grammar specification.

BTW, your reddit post is what got me here 🙂

guygastineau commented 3 years ago

Okay, well, if senpai cares neither that atom count == length of the atom list nor that bond count == length of the bond list, then they should be dropped from the format specification. I will post a BNF grammar formalization after dinner. Cheers.

guygastineau commented 3 years ago

Also, I will drop that constant from the bonds list, because, you know, it being constant and all... 🙃

Chelsea486MHz commented 3 years ago

Well, if you forget to setup bonds between atoms, or to define atoms altogether, it's not the program's fault aha. It simulates what you throw at it, if what you're giving SENPAI is garbage, SENPAI will give you garbage.

The constant in the bonds list must be kept there. It's a constant as in it is the Hooke's constant for the virtual spring modeling the covalent bond between the atoms, but it must be provided by the MDS file. Typically, the constant is determined from IR spectroscopy and adjusted after some simulations to better fit reality.

guygastineau commented 3 years ago

So it isn't a constant? Is it a constant wrt some given atom? I am just trying to understand? I didn't even know that was a thing.

EDIT: I am not a scientist, so I don't know the domain specific stuff here. I just like logic, programming, and PLT.

guygastineau commented 3 years ago

Oh, I see. The constant is K for Kx in Hooke's law.

Of course, I see it is necessary then. Sorry for the confusion.

Chelsea486MHz commented 3 years ago

Don't worry! The terminology is a bit confusing at times