Closed Anandu07 closed 11 months ago
Hey -
Assuming you're using the LmdbDataset
to read the LMDB the reason you're getting an error is because our trainers expect the data to be in a specific format rather than ASE objects.
Are you trying to train an S2EF or IS2RE model, depending on which one the data is expected in a different format? I can help provide a sample script once I get a better idea of the problem.
Thanks for the response. I'm trying to retrain/finetune an S2EF model (SCN/Equiformer) on specific adsorbate data, also later I want to experiment on IS2RE models as well:).
Yup. Try taking a look at the end sections here https://github.com/Open-Catalyst-Project/ocp/blob/main/tutorials/OCP_Tutorial.ipynb: (Optional) Creating your own LMDBs for use in the OCP repository
. This should help you get set up for creating S2EF and IS2RE datasets. Replace system_paths
with the paths to your extxyz
files, this code works for any ASE-parseable data format.
Let me know if you have any further questions.
@emsunshine If the ASE lmdb is easier to use here maybe you can provide guidance on that.
I would definitely recommend using one of the ASE datasets in this scenario. If you have ASE-readable files or an ASE DB you can avoid dealing with LMDBs. Here is some more information.
This issue has been marked as stale because it has been open for 30 days with no activity.
I've downloaded a dataset for specific adsorbates from DATASET_PER_ADSORBATE.md because I need to train a model specifically for these adsorbates. In order to proceed with the training, I'm attempting to convert this dataset into an LMDB format using a script. However, I'm encountering an issue with code I used I'm only able to get .mdb files, which throws error when I try to train the model. I would greatly appreciate it if anyone could offer suggestions or provide a code snippet to assist me with this challenge.
Below is the code I used to convert to 'lmdb format'