As we are working towards getting usage documentation setup, we should also develop a contributors guide.
Some of the topics I think that would be good to include/discuss:
Fork and Pull: I think we should adopt the Fork and Pull model for code contribution. (This is how I've been working with the repository thus far already). Two advantage of this approach: (1) contributors do not need any specific permissions to make a PR (only admins on the project can merge), and (2) it keeps the main code base branches less cluttered (as branches exist in contributors forks). We should include links to the relevant GitHub documentation (e.g., this page ) as well as just providing some basic step-by-step instructions. Related to this, it would be probably good to just include the steps needed for someone to pull from a fork without the need to create a separate repo (i.e., set the remotes properly), as this functionality is useful for testing the changers in someone's PR.
Code Formatting/Style: Important things to communicate: We are using Black formatter. We using type hints for our functions. Openff-units is used for units. We use numpydoc style doc strings.
Package/Module overview: There are a few key modules: curation, dataset, potential, train, utils. We should describe the key purpose of each one, key classes in each, examples, etc. Especially for curation and dataset, there are clear abstract base classes that needed to be extended for implementing new datasets, so some detailed discuss on how to do that would be good (as I imagine new datasets would be a key way someone could contribute). I think copy over some of our discussion on modular philosophy would be good, and specifically make note of the functions we really intend to be general and interchangeable (e.g., RBFs).
Tests: This might require a bit more discussion, but I think we need to articulate our approach for tests. We have recently been discussing moving our tests that validate directly against the original source package implementations to a separate separate directory, that would not be automatically run as part of the CI. Part of the reason is that for schnet and physnet, we must pip install these schnetpack that pins to some old software releases (and thus properly setting up the environment properly for the CI can be problematic in a general way, but can be somewhat easily accomplished for a local environment). At this point, we have validated against the implementation against the original packages and, with these validated versions, implemented many tests (not relying on these packages) that check the output. These are still important to run, but likely can only be done before a release or selectively if a given NNP is being modified. We also need to decide on the regression tests we want to perform and our overall expectations as to what tests users should be generating, where they should put them, and what they should be running before a merge.
I'm certain there are more things we need to include, this is just the first few things that came to mind.
As we are working towards getting usage documentation setup, we should also develop a contributors guide.
Some of the topics I think that would be good to include/discuss:
Fork and Pull: I think we should adopt the Fork and Pull model for code contribution. (This is how I've been working with the repository thus far already). Two advantage of this approach: (1) contributors do not need any specific permissions to make a PR (only admins on the project can merge), and (2) it keeps the main code base branches less cluttered (as branches exist in contributors forks). We should include links to the relevant GitHub documentation (e.g., this page ) as well as just providing some basic step-by-step instructions. Related to this, it would be probably good to just include the steps needed for someone to pull from a fork without the need to create a separate repo (i.e., set the remotes properly), as this functionality is useful for testing the changers in someone's PR.
Code Formatting/Style: Important things to communicate: We are using Black formatter. We using type hints for our functions. Openff-units is used for units. We use numpydoc style doc strings.
Package/Module overview: There are a few key modules: curation, dataset, potential, train, utils. We should describe the key purpose of each one, key classes in each, examples, etc. Especially for curation and dataset, there are clear abstract base classes that needed to be extended for implementing new datasets, so some detailed discuss on how to do that would be good (as I imagine new datasets would be a key way someone could contribute). I think copy over some of our discussion on modular philosophy would be good, and specifically make note of the functions we really intend to be general and interchangeable (e.g., RBFs).
Tests: This might require a bit more discussion, but I think we need to articulate our approach for tests. We have recently been discussing moving our tests that validate directly against the original source package implementations to a separate separate directory, that would not be automatically run as part of the CI. Part of the reason is that for schnet and physnet, we must pip install these schnetpack that pins to some old software releases (and thus properly setting up the environment properly for the CI can be problematic in a general way, but can be somewhat easily accomplished for a local environment). At this point, we have validated against the implementation against the original packages and, with these validated versions, implemented many tests (not relying on these packages) that check the output. These are still important to run, but likely can only be done before a release or selectively if a given NNP is being modified. We also need to decide on the regression tests we want to perform and our overall expectations as to what tests users should be generating, where they should put them, and what they should be running before a merge.
I'm certain there are more things we need to include, this is just the first few things that came to mind.