AIRI-Institute / nablaDFT

nablaDFT: Large-Scale Conformational Energy and Hamiltonian Prediction benchmark and dataset
https://doi.org/10.1039/D2CP03966D
MIT License
150 stars 15 forks source link

improve download experience #28

Open Fadelis98 opened 1 month ago

Fadelis98 commented 1 month ago

The full .db files are too large to be downloaded, is it be possible to provide with subsection compressioned versions?

KuzmaKhrabrov commented 1 month ago

Hello! For these purposes there are train2k/test2k versions. Are they still too large?

Fadelis98 commented 1 month ago

Hello! For the purposes there are train2k/test2k versions. Are they still too large?

Sorry for didn't express the point clearly. Sure there are small subsets that are more accessible, but I didn't mean there are too much data in the dataset, I do want to use the full dataset, while downloading the dataset (e.g 7T for hamiltonian) need several weeks due to the limited international bandwidth, and the connection sometimes lost during the procedure. So if the file is splited into chunks, it would be more easy to use

KuzmaKhrabrov commented 1 month ago

I see! Thank you for pointing this out, we will try to find a solution for this. By now, you may work with wavefunctions archives and reconstruct corresponding datasets.