Prediction time - Githubissues

Matt1h commented 2 months ago

Hi,

I am trying to use MACE in a setting where I have to make predictions for a large number of configurations. When running the prediction for a test set I saw that the computation time will scale linearly with the number of test configurations and that changing the batch size does not have much impact on the computation time. I was expecting that larger batches would result in more samples being predicted in parallel and accordingly in lower computation time for the prediction. When I looked into it I saw that most of the computation time is used for dataset_ = [data.AtomicData.from_config(config, z_table=z_table, cutoff=float(model.r_max)) for config in configs] of the main function of the eval_configs.py script. The time for the actual prediction of the batches is minor compared to it. Is it supposed to be like this, or is there anything I can do about it? Would it be possible to parallelize the generation of the dataset or could it be implemented more efficiently?

Thanks in advance & best regards, Matthias

ilyes319 commented 2 months ago

Hi,

Yes, this expected if you have a lot of small configurations. The time is mostly spent computing the neigbhor list and CPU-GPU data transfer. Make sure you are using the latest main branch, to use the matscipy neigbhoring list. Currently there is no alternative on the main branch. However one can use hdf5 files evaluation that is available on the develop branch, along with parrellel generation of the hdf5 file. This will precompute the neigbhor list and parrallelize it.

Matt1h commented 2 months ago

Okay I see, thank you!

ACEsuit / mace

Prediction time #377