bjmorgan / kinisi

A Python package for estimating diffusion properties from molecular dynamics simulations.
https://kinisi.readthedocs.io
MIT License
45 stars 10 forks source link

Confusing inputs for the time between steps in the file #66

Open arm61 opened 4 days ago

arm61 commented 4 days ago

kinisi uses the same terminology as pymatgen to define the time between the steps in an input. That is a time_step (the simulation time step for the integrator) and the step_skip (the output frequency to the file).

This is confusing and a stumbling block when new to kinisi.

Therefore, I propose that the meaning of time_step is changed to the time step between trajectory steps in the file and that step_skip is removed entirely. I welcome any thoughts on this.

user200000 commented 4 days ago

Yeah, this can be a source of confusion, and the change is a good move. Still, I would introduce a new terminology perhapstime_gap or similar to prevent confusion with people working from notebooks given to them by others as examples without anyone being aware of the change!

jd15489 commented 4 days ago

Yeah, this can be a source of confusion, and the change is a good move. Still, I would introduce a new terminology perhapstime_gap or similar to prevent confusion with people working from notebooks given to them by others as examples without anyone being aware of the change!

I agree. I think it is best to only take one value from the user, that being the time between frames in the trajectory. This may be future proofing for introducing something like pint.

How will this play with subsampling frames?

arm61 commented 4 days ago

Still, I would introduce a new terminology perhaps time_gap or similar to prevent confusion

My preference would be to keep the name time_step but if the user provides a step_skip then an error will be raised. Hopefully, removing the confusion.

How will this play with subsampling frames?

I think this would not effect the operation of the subsampling frames.

user200000 commented 4 days ago

I think the error plan would help a lot, but I worry that time_step will have a particular MD-based meaning for the user base, and may cause confusion. I could see less experienced scientists getting tangled up with this and not realising it (some trajectory formats include a timestep number and they could assume this is read somehow). There are edge cases with aimd where people could end up publishing incorrect MSDs and diffusion coefficients because they're not so far off as to be preposterous and so the error doesn't get caught.

It wouldn't be Kinisi's fault in this circumstance but I think we owe it to the broader community to guard against such errors.

bjmorgan commented 4 days ago

time_step and step_skip are good names IMO. They both mean what they say. And they have the advantage of being the same as in pymatgen.

arm61 commented 4 days ago

And they have the advantage of being the same as in pymatgen.

Not everyone is using pymatgen, and hopefully eventually everyone is using kinisi before using pymatgen.

jd15489 commented 4 days ago

I'm not sure having the same names as another package is advantageous. It only helps a subset of the user base (those familiar with pymatgen), and if pymatgen changes the names in the future this will be moot. Further, I agree that time_step could be confused for the MD integration timestep. timestep is also used by some packages (pymatgen included) to refer to the current timestep, this could cause further confusion.

arm61 commented 4 days ago

I am coming round to time_step not being a good choice. I don’t really like time_gap, any other ideas?

user200000 commented 4 days ago

time_spacing temporal_spacing stride_length (inspired by i-pi)

osoulas commented 4 days ago

Potentially step_write_frequency? It's a little long winded, but i think it gets the point across.

jd15489 commented 4 days ago

dt frame_step frame_time frame_time_step frame_gap

arm61 commented 4 days ago

frame_step is my favourite so far. Followed by stride_length and stride_step.

osoulas commented 4 days ago

frame_interval

arm61 commented 3 days ago

I am not a fan of the *_interval options. This is because I often talk about $\Delta t$ as the "time interval", and in the past, some people have been confused and think that the ballistic regime is times before the simulation has equilibrated (this is not the case). So, I want to keep simulation time and MSD time descriptors separate.

osoulas commented 3 days ago

My issue with frame_step is that, to me, it reads as a number of steps (basically step_skip) rather than a timeframe. Maybe @jd15489's suggestion of frame_time would make more sense?

bjmorgan commented 3 days ago
  1. timestep meaning time per integration step is standard MD terminology.
  2. step skip (number of steps skipped per output frame) seems clear.

Maybe we are rediscovering why pymatgen splits these into two arguments?

bjmorgan commented 3 days ago

I would argue for keeping these separate, but potentially renaming step_skip. If you run a MD simulation, these are separate inputs with different units. As a user, I always think about my output frequency in units of timesteps, not in units of simulation time ( = timestep × output frequency)

arm61 commented 3 days ago

Based on the number of confused user questions about this problem (95 % of user problems have been around this), I think you might be alone in that @bjmorgan.

arm61 commented 3 days ago

From @bjmorgan:

I agree with the principle that we just multiply these numbers together, but, this means the user needs to remember to multiply together two numbers from their MD input (which they have already shown they don’t understand) and put that number into kinisi. So the current approach you just use the numbers straight from the MD setup instead of doing the multiplication yourself as an extra step.

I think this is a good argument.

jd15489 commented 3 days ago

I agree that this is a good argument.

What do MD packages call these things? LAMMPS calls these: timestep and N. DLPOLY calls them: timestep and timestep interval. VASP calls them: POTIM (or time step) and NBLOCK.