Andrew-S-Rosen / QMOF

The QMOF Database: A database of quantum-mechanical properties for metal-organic frameworks.
MIT License
125 stars 25 forks source link

Make GitHub repo more amenable to user contributions #2

Closed Andrew-S-Rosen closed 3 years ago

Andrew-S-Rosen commented 3 years ago

Self-explanatory. As brought up by @kjappelbaum, the GitHub page should make it more seamless to fork and make PRs. Will need to navigate file size limits and should be kept small here, but nonetheless should have some of the curated data from the Figshare. Also provide instructions on how to contribute,

kjappelbaum commented 3 years ago

some ideas:

I think it would be really great if one could keep evolving the database with the community. Someone (you? 😆) would still need to update the DFTs derived data, but that could be done for each release based on the changelog and you just point to new versions of files on figshare in the readme.

Andrew-S-Rosen commented 3 years ago

Good ideas! Thanks for sharing!

One thing for me to think about is that I don't want this to necessarily be a structure repository because the parent CSD MOF subset (and CoRE MOF database) are both excellent at their jobs there. Ideally, any structural errors could be noted and actually updated here for the CSD MOF subset or here for the CoRE MOF database since then it's addressing the upstream repositories.

Nonetheless, we all like clean data, so it's worth at the very least flagging potential issues here to help out future studies building off the QM/ML work in our initial paper. I can, in principle, rerun VASP calculations for a set of manually updated CIFs, although in reality it will probably be too unsustainable of a solution. Of course, others can do so if they wish. The ASE parameters are described in Table S2 of the SI.

All good things to think about.

Andrew-S-Rosen commented 3 years ago

Thanks again for your helpful comments, @kjappelbaum. I just laid the groundwork for a possible contributing scheme, as described here. Very much a work in progress. Will keep this Issue open until I settle on a solid scheme.

Andrew-S-Rosen commented 3 years ago

[Mostly a repeat from before, but I'm marking the issue as closed since this is a minimal working solution and mostly all I have time for right now]

A barebones solution has been set up, as described here. The tracker has a "development" version of the DFT-optimized CIFs, which can be organized into "clean" or "issues" subfolders. Note that the un-optimized CIFs cannot be shared on GitHub since they are property of the CCDC. In general, this contribution approach is currently very simple, so it might be worth revisiting @kjappelbaum's excellent suggestions in the future. I particularly like the idea of including some actions to automatically validate CIFs.

One thing that bothers me with the current solution is that every time a change is made, the .zip file will be updated. This is a 25 MB file, and repeated changes are going to be annoying for the Git history. I can clean it up if it ever becomes unwieldy. I still like this better than having every CIF unpacked on the GitHub repo because that makes it really cumbersome to clone.