Open hl419 opened 9 months ago
There is no way of reproducing long MD trajectories due to the chaotic nature of the many-body dynamic systems.
There is no way of reproducing long MD trajectories due to the chaotic nature of the many-body dynamic systems.
Thanks for the reply, but the deviation starts only after 10 ps. I do not think this should be expected....? The truncation error discussed in #1656 makes more sense to me. I am wondering if there is a way to improve this?
There is no way of reproducing long MD trajectories due to the chaotic nature of the many-body dynamic systems.
Thanks for the reply, but the deviation starts only after 10 ps. I do not think this should be expected....? The truncation error discussed in #1656 makes more sense to me. I am wondering if there is a way to improve this?
I would not expect a consistency beyond 1000 time steps.
There is no way of reproducing long MD trajectories due to the chaotic nature of the many-body dynamic systems.
Thanks for the reply, but the deviation starts only after 10 ps. I do not think this should be expected....? The truncation error discussed in #1656 makes more sense to me. I am wondering if there is a way to improve this?
I would not expect a consistency beyond 1000 time steps.
Hi Wanghan,
Could you please provide further details on this? Feel free to correct me if I'm mistaken. I anticipate that a potential model, once trained, should be deterministic in its inference step, similar to a trained neural network model. Thus, would you consider this potential model (specifically, a deepmd trained potential) to be stochastic? If so, could you explain how it operates in that manner?
Thanks,
Ariana
It would actually be interesting to see how non-deterministic deepmd's use of Tensorflow actually is and what this means for an MD trajectory.
Our customized CUDA OP also uses non-deterministic atomicAdd
.
The deterministic implementation may need extra effort, which might not be worth doing.
Our customized CUDA OP also uses non-deterministic
atomicAdd
.The deterministic implementation may need extra effort, which might not be worth doing.
Yep--already found this and we have been working on re-coding it.
What I am trying to see is if 1. we can add the https://www.tensorflow.org/api_docs/python/tf/config/experimental/enable_op_determinism
flag anywhere, or maybe we need the TF1 alternative which is a patch and an export variable (currently confused about what to expect with tensorflow.compat.v1
), and where would it go in the code, 2. if there are any unsupported ops still in the code that have no deterministic version.
Lots of good info here: https://github.com/NVIDIA/framework-reproducibility/blob/master/doc/d9m/tensorflow.md
I would like to measure how much the custom CUDA kernel contributes, and how much any TF ops contribute. I am wondering if there is a way to use GPU for TF but disable the CUDA prod_force.cu kernel? It seems DP_VARIANT is all or nothing from my tests.
The other thing I am hung up on is trying to print model weights from the frozen model. I can't seem to get any of the tf1 compat methods to do it, I guess since it was made with TF1 on top of TF2 or something. I would love to compare the model weights over multiple "identical" runs of training.
By the way, there seems to be another source of "non-determinism" in the deepmd code that may actually be a bug. I ran the se_e2_a
water example a bunch of times and there is non-determinism in the learning rate decay schedule. Using the default input file, sometimes I am getting it go from 1.0e-03 to 3.6e-04 on step 5000, and sometimes it goes from 1.0e-03 to 9.0e-04! This doesn't seem to affect the loss values on the water test, but on a large system with lots of training data it makes a huge difference in the loss values and on the model that is trained.
Digging in, I see the learning rate is scheduled with a tf1 module. Now, this shouldn't be parallel and cause the non-determinism like in the other ops that comes from CUDA atomics, I wouldn't think. Maybe it's an uninitialized variable or something? Or some sort of rounding instability. But this causes dramatic differences in reproducibility of training on identical data with identical settings/hyperparameters and stack.
I would like to measure how much the custom CUDA kernel contributes, and how much any TF ops contribute. I am wondering if there is a way to use GPU for TF but disable the CUDA prod_force.cu kernel? It seems DP_VARIANT is all or nothing from my tests.
DP_VARIANT=cpu
will disable all CUDA customized ops. If you want to disable a single OP, you can comment the following lines:
https://github.com/deepmodeling/deepmd-kit/blob/91049df4d4cfbdf5074a4915e4409c01cae2333c/source/op/prod_force_grad_multi_device.cc#L275-L276
The other thing I am hung up on is trying to print model weights from the frozen model. I can't seem to get any of the tf1 compat methods to do it, I guess since it was made with TF1 on top of TF2 or something. I would love to compare the model weights over multiple "identical" runs of training.
This is a bit complex, but the devel
branch has implemented this feature (for se_e2_a
only) as a part of the multiple-backend support. See #3323.
By the way, there seems to be another source of "non-determinism" in the deepmd code that may actually be a bug. I ran the
se_e2_a
water example a bunch of times and there is non-determinism in the learning rate decay schedule. Using the default input file, sometimes I am getting it go from 1.0e-03 to 3.6e-04 on step 5000, and sometimes it goes from 1.0e-03 to 9.0e-04! This doesn't seem to affect the loss values on the water test, but on a large system with lots of training data it makes a huge difference in the loss values and on the model that is trained.
Do you change the number of training steps? The learning rate depends on it. https://github.com/deepmodeling/deepmd-kit/blob/91049df4d4cfbdf5074a4915e4409c01cae2333c/deepmd/tf/utils/learning_rate.py#L89-L91
learning rate: Number of steps were the same, nothing was different except I ran it again, I am pretty sure. I will run some more tests to verify.
What I am trying to do is enable TF to run on GPU but disable all the local deepmd CUDA kernels (non-TF). I guess I can go in and comment those all out and then build with GPU to get TF on the device.
Will check out the model checking options, thanks...
So I've done some reproducibility testing just on model training and inference. I ran the exact same training on the same data, same hyperparameters, twice to get 2 "identical" models on two different DFT datasets. I ran this with different sets of training step number. Then I ran dp test
on 1000 frames for each model.
I have some baffling results. When I look at the maximum absolute difference in predicted force components (x, y, z) for one system (120 atoms, 110,000 training frames) the variations between "identical" training runs are pretty huge. Some atoms' predicted force components across the two "identical" trainings can be as high as 1 eV/Å. It increases with number of training steps: around 0.2 eV/Å for 100,000 training steps, 0.4 eV/Å for 200K training steps, and over 1 eV/Å for 1M training steps. These numbers were confirmed on a different system running the deepmd-kit container.
For the other system, 623 atoms and ~60K training frames, the maximum absolute difference is much lower, about 1.3e-11 eV/Å for 20K steps, about 1e-10 eV/Å for 100K training steps (this system takes longer to train so I am still getting data for longer training times). But it's a HUGE difference in non-deterministic variation between these systems.
The other thing that is troubling is that for both systems, changing the random seed leads to a max abs difference in predicted force components of around 0.4 eV/Å.
I am sort of wondering if there is some bug in the code or the test module, because none of this makes any sense, especially the massive max differences for the one smaller system.
It would be good to run more tests on other datasets. I found a few things online.
Tests were all run with a pip install of DeePMD-kit on an x86 AMD EPYC CPU + NVIDIA A100 GPU with Ubuntu OS.
Do you get the same behavior with the CPU, or is it only the behavior of the GPU?
Printing to lcurve.out
one step by one step may help find the difference.
Please note, according to TF documentation, tf.Session
will introduce non-determinism. I am not sure where the non-determinism comes from, but it seems that it is not expected to get determinism results.
Yes, there should be some nondeterminism with TF. But I didn't expect it to affect the forces THAT much. That's a lot. And it seems strange that it would affect one system so much and not the other.
I will run some tests with CPU-only training and also with the CUDA kernels turned off.
Good idea about printing lcurve in smaller increments, will also try this.
I'm also wondering what it would take to turn on TF determinism in DeePMD. Some detailed notes on doing this can be found here: https://github.com/NVIDIA/framework-reproducibility/blob/master/doc/d9m/tensorflow.md
We are working with Duncan/NVIDIA so we can ask questions. I am just not sure what to do with the tf1 compat API on top of TF2 package. It seems to fall through the cracks. If I were to add tf_determinism.enable_determinism()
to DeePMD code, where should it go? Also, tf.keras.utils.set_random_seed(SEED)
. I can try this if you tell me where I should put these commands.
For random seed: we don't use any global random seed. Instead, the seed is passed from the input file, like
For determinism for tf.compat.v1: I don't know and have never used it. The most helpful thing should be https://github.com/tensorflow/community/pull/346
Yes, that is what I am talking about. Where in the code would be the top-most entrypoint to add this command so it propagates down to all the TF calls? Or maybe it needs to go in multiple places?
It is possible to obtain the same model parameters with deepmd provided that
The interested reader can try out the open PR combined with these three variables to be added to their scripts.
export TF_DETERMINISTIC_OPS=1
export TF_INTER_OP_PARALLELISM_THREADS=0
export TF_INTER_OP_PARALLELISM_THREADS=0
We successfully ran the training and inference tests more than twice and got the same answer all the time.
Summary
Hello,
I’m currently attempting to replicate an NVT simulation. I’ve set the seed for the initial velocity and confirmed that the initial velocity is consistent. I am also using the same machine and same version to run the simulation (single processor). However, I’ve noticed that up to a certain time-step, the positions and velocities start to deviate. I checked the previous cases #1656 and #2270, and it seems this issue came from truncation error. I wonder if there is a way to improve the precision to avoid this from happening? I would like to get a deterministic simulation that I can reproduce with exactly the same result using the same inputs. Thanks!
DeePMD-kit Version
v2.2.1
TensorFlow Version
-
Python Version, CUDA Version, GCC Version, LAMMPS Version, etc
No response
Details
I’m currently attempting to replicate an NVT simulation. I’ve set the seed for the initial velocity and confirmed that the initial velocity is consistent. I am also using the same machine and same version to run the simulation (single processor). However, I’ve noticed that up to a certain time-step, the positions and velocities start to deviate. I checked the previous cases #1656 and #2270 , and it seems this issue came from truncation error. I wonder if there is a way to improve the precision to avoid this from happening? I would like to get a deterministic simulation that I can reproduce with exactly the same result using the same inputs.