deepmodeling / deepmd-kit

A deep learning package for many-body potential energy representation and molecular dynamics
https://docs.deepmodeling.com/projects/deepmd/
GNU Lesser General Public License v3.0
1.5k stars 511 forks source link

Failure to compress a model #790

Closed njzjz closed 3 years ago

njzjz commented 3 years ago

Discussed in https://github.com/deepmodeling/deepmd-kit/discussions/787

Originally posted by **XTFan07** June 22, 2021 I use 'module load deepmd/compress dp compress input.json -i 1.3-model.pb -o compressed-model.pb' to compress a model and it goes wrong. How can I adress this problem? Traceback (most recent call last): File "/data/share/apps/deepmd/compress/bin/dp", line 8, in sys.exit(main()) File "/data/share/apps/deepmd/compress/lib/python3.8/site-packages/deepmd/main.py", line 327, in main compress(**dict_args) File "/data/share/apps/deepmd/compress/lib/python3.8/site-packages/deepmd/entrypoints/compress.py", line 108, in compress freeze(checkpoint_folder=checkpoint_folder, output=output, node_names=None) File "/data/share/apps/deepmd/compress/lib/python3.8/site-packages/deepmd/entrypoints/freeze.py", line 147, in freeze input_graph_def = graph.as_graph_def() File "/data/share/apps/deepmd/compress/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 3294, in as_graph_def result, _ = self._as_graph_def(from_version, add_shapes) File "/data/share/apps/deepmd/compress/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 3211, in _as_graph_def graph.ParseFromString(compat.as_bytes(data)) google.protobuf.message.DecodeError: Error parsing message
njzjz commented 3 years ago

Please report the version you are using

Originally posted by @njzjz in https://github.com/deepmodeling/deepmd-kit/discussions/787#discussioncomment-907193

njzjz commented 3 years ago

Thanks for your reply. The version I use is a test branch with compression function, and the same error was also found when using v2.0.0.beta1.

Originally posted by @XTFan07 in https://github.com/deepmodeling/deepmd-kit/discussions/787#discussioncomment-907799

felix5572 commented 3 years ago

I have come across the same problem . And in My case the compressed model is too large. Maybe try larger step.

njzjz commented 3 years ago

I have come across the same problem . And in My case the compressed model is too large. Maybe try larger step.

This message should be given to users. Can you give me an example which can throw this error?

felix5572 commented 3 years ago

@njzjz OK, I will.

felix5572 commented 3 years ago

@njzjz relating #767 https://deepmd-kit.oss-cn-beijing.aliyuncs.com/water_wanghan_models.tar.gz when the step is 0.001 The model will be about 500MB And when the step is 0.0005 or smaller. It will raise error

njzjz commented 3 years ago

@felix5572 I fail to download data... You may send me in private,

njzjz commented 3 years ago

Also, may I ask what's your TF version?

felix5572 commented 3 years ago

tf version when compressing the model? with the tf in 2.0.0.beta2 release

njzjz commented 3 years ago

tf version when compressing the model? with the tf in 2.0.0.beta2 release

@felix5572 I mean tensorflow version. 2.0.0.beta2 should be DP version.

I cannot reproduce the error using

dp compress input.water.wanghan20210609.json -s 0.0005 -i graph_water_convert-2.0.pb -o frozen_model_compressed2.pb

The output file is

-rw-rw-r--. 1 jz748 jz748 966M Jul  9 23:26 frozen_model_compressed2.pb
felix5572 commented 3 years ago

maybe much smaller 0.0002 ?

njzjz commented 3 years ago

Reproduced using 0.0002

njzjz commented 3 years ago

Per https://stackoverflow.com/a/34186672/9567349, protobuf has a size limit of 2GB... Maybe that's the reason?

felix5572 commented 3 years ago

maybe it reaches the 2GB limit of google protobuf during serialize

njzjz commented 3 years ago

I modify TF and print the size of data

_data = compat.as_bytes(data)
print("the size of data is:", len(_data))

The output is

the size of data is: 2515641031

2.34 GB

SchrodingersCattt commented 1 year ago

May I ask if there is any solution to bypass the size limit of protobuf? I don't want to sacrifice my model's accuracy. Or what should I do to decrease my model other than use larger -s? Use a simpler network or use less training sets? Because I am trying -s 0.05 but still have this error. Thank you!

njzjz commented 1 year ago

May I ask if there is any solution to bypass the size limit of protobuf? I don't want to sacrifice my model's accuracy.

Or what should I do to decrease my model other than use larger -s? Use a simpler network or use less training sets? Because I am trying -s 0.05 but still have this error.

Thank you!

Protobuf still has this limitation in 2023.

I guess your models have lots of elements. You can try to use the type embedding that just supports compression in the latest version, or use float32 for neural networks instead of float64.

SchrodingersCattt commented 1 year ago

May I ask if there is any solution to bypass the size limit of protobuf? I don't want to sacrifice my model's accuracy. Or what should I do to decrease my model other than use larger -s? Use a simpler network or use less training sets? Because I am trying -s 0.05 but still have this error. Thank you!

Protobuf still has this limitation in 2023.

I guess your models have lots of elements. You can try to use the type embedding that just supports compression in the latest version, or use float32 for neural networks instead of float64.

Thank you very much! I will have a try.