cancerDHC / example-data

This repository is intended to act as a store of example data files from across the NCI Cancer Research Data Commons (CRDC) nodes in a number of formats.
MIT License
0 stars 3 forks source link

Document a flow for new use cases to be proposed and developed #19

Open gaurav opened 2 years ago

gaurav commented 2 years ago

The Example Data Workflow is intended to be use case based: we will start with very basic use-cases to demonstrate use of the LinkML-generated artifacts, and then extend that to more sophisticated use-cases based on the needs of CCDH team members and the Nodes.

We should document how new use cases will be proposed and turned into code. I propose the following:

  1. New use cases should be proposed by creating an issue and mark it with the use case label.
    • New use cases are not currently being proposed from outside the CCDH Tools Team. When we reach that point, it will be a good idea to make an issue template to make it easier for people to submit good use cases for us to use.
  2. If the use case is determined to be out of scope, it should be labeled as wontfix and closed.
  3. In some cases, an existing example can be modified to demonstrate this use case rather than creating a new example. If this can be done without overly complicating the existing example, then this should be done.
  4. The use case should be turned into a Jupyter Notebook that explains the use case and then provides example code for solving it.
    • Ideally, the original submitter would kick off this process by creating a pull request with a Jupyter Notebook that describes the problem (and even provides some code that they expect to work, but doesn't). But we understand that this is technically complicated and might not be easy to do, so just an issue is just fine.
    • If the use case is not clear, it might be a good idea to create a pull request with the Jupyter Notebook that describes the problem and then share that with the proposer to ensure that we fully understand what they're trying to do.
  5. The Jupyter Notebook should be modified until it can be run correctly.
    • This might require changes to the CRDCH model, either at the model generation level or by making changes to the model itself.
  6. If the example data is large, it might be better to create a Jupyter Notebook that works on a subset of the data, and write a Python script that works on all of the available data, which can be executed during Continuous Integration testing. This allows larger examples to be built without making an overly complicated Jupyter Notebook.
  7. Once the Jupyter Notebook is ready, it should be sent to the proposer for their review. The proposer can close the issue once they believe their use case has been met.