Open lee212 opened 3 years ago
do these num_extrinsic_outliers, n_most_recent_h5_files, k_random_old_h5_files
require to match to num_tasks
from simulations? because I see 16 num_tasks whereas 12 on the other three.
Yes, num_extrinsic_outliers
should be greater than or equal to num_tasks
. num_intrinsic_outliers
are selected using the AI based approach, then from those num_extrinsic_outliers
are taken as the "best" outliers which are used to restart new simulations.
For this particular bug, try setting these parameters:
num_extrinsic_outliers: 16
n_most_recent_h5_files: 16
k_random_old_h5_files: 16
Hopefully this explanation helps to clarify other aspects:
Each MD simulation runs for simulation_length_ns
ns and reports a frame every report_interval_ps
ps, and its output is simulation_length_ns
ns (1000 ps/ns) / (report_interval_ps
ps) frames. So 1 ns 1000 / 1ps = 1000 output frames per MD sim. Those are the frames that get written to HDF5.
n_traj_frames
in the Agent config should be set to 1000 in this case since it is the number of frames in each h5 file.
n_most_recent_h5_files
should most likely be set to the number of MD sims num_tasks
since it is responsible for gathering the most recent simulation data for inference and outlier detection.
If n_most_recent_h5_files
is set less than num_tasks
, then the agent will be sub optimal since it may be missing out on outlying states. If it's set greater than the number of MD sims, it may have an index error.
num_intrinsic_outliers
should be set greater than or equal to num_tasks
. It will select the number of outliers determined in a purely unsupervised way. This should be at least equal to the number of MD sims since it is used to create the restart points for the next round of simulations.
num_extrinsic_outliers
comes into play when extrinsic_score
is active. This one further prunes the intrinsic outliers using some biophysical scoring function. If extrinsic_score
is nan
then it will simply take the first num_extrinsic_outliers
elements of the intrinsic outlier array. It should be greater than or equal to num_tasks
.
k_random_old_h5_files
sets the number of old data files to look at which is helpful for outlier detection and maintaining the previously sampled data distribution. I don't think this parameter can cause an index error though.
We can add this explanation to the documentation since it is a really common configuration error.
Describe the bug I am reporting this first, there are 12 items in the list but the index seeks more than the actual size, e.g., 12, 13, 14, ... which generates the error:
To Reproduce Steps to reproduce the behavior:
HPC Platform PSC Bridges2
Link to YAML configuration file https://github.com/DeepDriveMD/DeepDriveMD-pipeline/blob/feature/psc_bridges2/experiment/deepdrivemd_bridges.yaml
Commands run ...
Failed command
Expected behavior Check data size (
if index < len(data)
) before locating an index in the list. or warning message to adjust incorrect parameters in the yaml configuration.Screenshots If applicable, add screenshots to help explain your problem.
Additional context 1st iteration was finished successfully (16 num_tasks), and this error happened in the middle of 2nd iteration.