Open cmclausen opened 2 weeks ago
hi @cmclausen! On this now! Its possible there is something wired slightly wrong in our code for default behavior to work :'(
I am working on resolving this and adding a test case.
In the meantime if you are blocked, I think it might be possible for you to add,
load_balancing_on_error: warn_and_no_balance
under the optim section to turn this error into just a warning
Our code and tests look correct, I think what you are trying should be working. LMDB datasets are not deprecated, they are still used. There were changes made to batch balancing and code cleanup/reconsolidating which I think is causing your issues. I'm sorry :'(
In order to use batch balancing (to balance systems across multiple simultaneous GPUs) you need to have a valid dataset._metadata value that contains the 'natoms' field. Or you can just fully disable this by using the following,
optim:
load_balancing: False
The implementation you have about looks like it should not trigger the error you are getting, because you clearly define ._metadata , https://github.com/FAIR-Chem/fairchem/blob/83fd9d21c4c0430746e7d3b49d99b33f02956660/src/fairchem/core/common/data_parallel.py#L112
Can you try adding some debug statements inside of BalancedBatchSampler._ensure_support and getting the value of dataset._metadata , most importantly finding out why it does not seem to have a 'natoms' field?
Hope this helps, please let us know how it goes!
Hi @misko, Thanks for your attention on this.
I used load_balancing
and load_balancing_on_error
but for some scenarios BalancedBatchSampler._ensure_support throws an AttributeError if _metadata is completely missing and that cannot be bypassed as it is currently.
Is it correct that the _metadata is supposed to originate from .npz-file as per core.datasets.base_dataset? Earlier I have just converted my structures using AtomsToGraphs from fairchem.core.preprocessing, added the relevant attributes, and saved them to an LMBD.
hi, @cmclausen , i have the same trouble when using AtomsToGraphs to lmbd. run main.py --mode predict have the attribute error. have you solved the problem? @misko ,i test the load_balancing=false is not work. this is my config.yml dataset: test: a2g_args: r_energy: false r_forces: false format: lmdb src: /home/train_test_data/graph_input/data.0000.lmdb evaluation_metrics: metrics: energy:
pip install fair-core is not the lastest version. so the load_balancing and load_balancing_on_error is not work
Hi there, I'm having trouble with my own LMDB datasets and BalancedBatchSampler after a recent update. It now requires the dataset to pass .metadata_hasattr("natoms") unless the UnsupportedDatasetError is thrown. I have previously made graph datasets for on-the-fly inference and have amended those to accommodate the change:
However, the error also occurs when initializing training with LMDB datasets from
main.py
and a config file. Are LMDB datasets fully deprecated now or if not what is the new protocol for making these and passing them to the trainer?Best regards Christian