choderalab / yank

An open, extensible Python framework for GPU-accelerated alchemical free energy calculations.
http://getyank.org
MIT License
177 stars 70 forks source link

Slow I/O #1157

Open jchodera opened 5 years ago

jchodera commented 5 years ago

We should look into why calculations spend so much time writing sampler states, and why this time increases with iteration number.

Some thoughts:

Longer term, we should consider a new storage strategy that can send different data types to different storage locations in an asynchronous manner, mindful of the data we need to (1) safely restart, and (2) analyze to estimate binding free energies. This can happen alongside a refactoring of the storage API to make it much easier to use from a programmatic perspective.

andrrizzi commented 5 years ago

I did some timings and it looks like this is due to the large chunksize we use for storing positions, which is currently a function of the checkpoint_interval.

These are timings for a single iteration with checkpoint_interval = 100

2019-05-14 07:26:45,131: Storing sampler states took    6.120s
2019-05-14 07:26:45,132: Writing replica thermodynamic states took    0.001s
2019-05-14 07:26:45,237: Writing MCMC moves statistics took    0.105s
2019-05-14 07:26:45,239: Writing energies took    0.002s
2019-05-14 07:26:45,240: Writing mixing statistics took    0.001s
2019-05-14 07:26:45,241: Write timestamp took    0.000s
2019-05-14 07:26:45,248: First sync took    0.007s
2019-05-14 07:26:45,248: Write last iteration took    0.000s
2019-05-14 07:26:45,249: Second sync took    0.000s
2019-05-14 07:26:45,249: Writing iteration information to storage took    6.239s

these for checkpoint_interval = 10

2019-05-14 10:21:20,646: Writing checkpoint sampler states took    0.000s
2019-05-14 10:21:20,672: Building sliced sampler states took    0.026s
2019-05-14 10:21:20,685: Writing solute-atom sampler states took    0.012s
2019-05-14 10:21:20,685: Storing sampler states took    0.038s
2019-05-14 10:21:20,685: Writing replica thermodynamic states took    0.000s
2019-05-14 10:21:20,768: Writing MCMC moves statistics took    0.083s
2019-05-14 10:21:20,770: Writing energies took    0.001s
2019-05-14 10:21:20,771: Writing mixing statistics took    0.001s
2019-05-14 10:21:20,771: Write timestamp took    0.000s
2019-05-14 10:21:21,497: First sync took    0.726s
2019-05-14 10:21:21,498: Write last iteration took    0.000s
2019-05-14 10:21:21,498: Second sync took    0.000s
2019-05-14 10:21:21,498: Writing iteration information to storage took    0.852s

and these for checkpoint_interval = 10 and without compressing positions

2019-05-14 11:24:44,415: Writing checkpoint sampler states took    0.000s
2019-05-14 11:24:44,447: Building sliced sampler states took    0.031s
2019-05-14 11:24:44,463: Writing solute-atom sampler states took    0.016s
2019-05-14 11:24:44,463: Storing sampler states took    0.048s
2019-05-14 11:24:44,464: Writing replica thermodynamic states took    0.000s
2019-05-14 11:24:44,568: Writing MCMC moves statistics took    0.105s
2019-05-14 11:24:44,570: Writing energies took    0.002s
2019-05-14 11:24:44,571: Writing mixing statistics took    0.001s
2019-05-14 11:24:44,572: Write timestamp took    0.000s
2019-05-14 11:24:44,583: First sync took    0.011s
2019-05-14 11:24:44,584: Write last iteration took    0.000s
2019-05-14 11:24:44,584: Second sync took    0.000s
2019-05-14 11:24:44,585: Writing iteration information to storage took    0.170s

A few considerations on the solutions after seeing this

We very likely should not be calling sync() every iteration.

It looks like NetCDF could decide to synchronize in write_sampler_state() even if we don't explicitly call sync(). In the simulation with large checkpoint_interval, this was always the case, while it looks like in the simulation with small checkpoint interval this didn't happen and the bulk of the time was spent in the sync() call. I'm not sure if there's a flag we can set to control this, but if there is one, this could help a lot.

Writing solute coordinates to the analysis file may be responsible for much more bandwidth

Yes. It looks like this is definitely the step that takes the most. We should keep in mind that having this probably saves a lot of time when we want to write out the trajectory. We want to implement flags to avoid saving all snapshots and/or states and/or replicas or something to turn it off altogether anyway.

it is possible compression may be responsible for the time increasing each iteration

Based on the last experiment it looks this could save a good chunk of time too (about 0.7s/iteration). I don't know how larger the netcdf file could become without zlib compression, however, and the checkpoint interval is definitely the dominant factor.