deepmodeling / AIS-Square

GNU Lesser General Public License v3.0
10 stars 8 forks source link

Dataset Mismatch From Actual Description For H2O-PBE0TS #119

Closed AnuragKr closed 1 year ago

AnuragKr commented 1 year ago

Hello,

Dataset Link -- https://www.aissquare.com/datasets/detail?pageType=datasets&name=H2O-PBE0TS

Problem Statement -- On the website and in the paper what dataset size mentioned is actually different from what is present.

System ||||||||| Original Dataset Size Mentioned |||||||||||| Actual Dataset Size Present

lw_pimd |||||||| 105000 total snapshots |||||||||||||| 100000

ice (b) ||||||||| 24000 total snapshots |||||||||||||||| 20000

ice (c) ||||||||| 12000 total snapshots ||||||||||||| 10000

ice (d) ||||||||| 12000 total snapshots |||||||||||||| 10000

Please check it once and let me know if it is possible to add the missing data.

AnuragKr commented 1 year ago

When I can expect this issue to be resolved?

Mile-Away commented 1 year ago

Hello,

I’m glad I could help answer your question.

This entry is a user-uploaded entry. After we confirmed with the uploading user, the dataset contained in this entry should be:

The training datasets include 100000 snapshots (from 105000 total snapshots) randomly selected along the liquid water trajectory, 20000 snapshots (from 24000 total snapshots) randomly selected along the ice (b) trajectory, 10000 snapshots (from 12000 total snapshots) randomly selected along the ice (c) trajectory, and 10000 snapshots (from 12000 total snapshots) randomly selected along the ice (d) trajectory.

This matches the dataset you actually downloaded, that is:

System Actual Dataset Size Present
lw_pimd 100000
ice (b) 20000
ice (c) 10000
ice (d) 10000

For more information, you can read the introduction of this entry.

Note that the number of entries in the introduction is 5% different from here because set.000 was ignored and multiplied by 19 in error, so please refer to this explanation for the correct value.

AnuragKr commented 1 year ago

@Q-Query Thanks for the clarification.

You mentioned an error multiplied by 19 - Does it imply that whatever predicted force and energy are there in the data has been multiplied by 19? Hence there is a chance of getting a different RMSE than the original paper result. To get the correct value need to divide by 19.

Note that the number of entries in the introduction is 5% different from here because set.000 was ignored and multiplied by 19 in error, so please refer to this explanation for the correct value.

Sorry if I misunderstood your point, Can you please explain it again?

Mile-Away commented 1 year ago

I'm sorry for causing your misunderstanding.

My intention is relatively simple, explained as follows:

The training datasets include 95000 snapshots (from 105000 total snapshots) randomly selected along the liquid water trajectory, 19500 snapshots (from 24000 total snapshots) randomly selected along the ice (b) trajectory, 9500 snapshots (from 12000 total snapshots) randomly selected along the ice (c) trajectory, and 9500 snapshots (from 12000 total snapshots) randomly selected along the ice (d) trajectory.

The training datasets include 100000 snapshots (from 105000 total snapshots) randomly selected along the liquid water trajectory, 20000 snapshots (from 24000 total snapshots) randomly selected along the ice (b) trajectory, 10000 snapshots (from 12000 total snapshots) randomly selected along the ice (c) trajectory, and 10000 snapshots (from 12000 total snapshots) randomly selected along the ice (d) trajectory.