cancerDHC / ccdhmodel

CRDC-H model in LinkML, developed by the Center for Cancer Data Harmonization (CCDH)
https://cancerdhc.github.io/ccdhmodel/
BSD 3-Clause "New" or "Revised" License
16 stars 8 forks source link

Organization of `generators/google-sheets` in the repo #95

Closed sujaypatil96 closed 2 years ago

sujaypatil96 commented 3 years ago

This issue is a design consideration, and has absolutely no implications on the functioning of the system.

I was thinking about the organization of the generators/google-sheets folder in the repo. We had a chat about this during one of our calls as well, and I had a couple of proposals in mind:

So something like this:

├───.github
│   └───workflows
├───.idea
│   └───inspectionProfiles
├───csv
├───examples     # include an example .py that loads the packages from `vendor` and calls google sheet generation modules and methods
├───examples-for-discussion
├───vendor
│    └───sheet2linkml
│         ├───source
│         │   └───gsheetmodel
│         └───terminologies
│               └───tccm
├───graphql
├───jsonschema
├───owl
├───python
├───shex
└───src
    ├───docs
    └───schema

As noted in the comments, we can have an examples folder, with modules that load packages from the vendor folder, in this case, the sheet2linkml package.

CC: @gaurav @turbomam


Action items:

turbomam commented 3 years ago

I love that functionality that @gaurav developed, but it does make me uneasy that you have to cd into its directory to run the code. So I would like to see it at least changed to something that you can call from an arbitrary path, and provide with command line arguments (or a LinkML config file) if necessary.

I have also thought about making it an independent PyPI package too, but have wondered if it is too hardcoded towards our task to be of general use? Would other people want to check it out? Could it be generalized to other GSheets to LinkML tasks? I assume we shouldn't work on anything like that generalization for the foreseeable future.

gaurav commented 3 years ago

Reduce the amount of clutter in the ccdhmodel repo, by publishing the sheet2linkml package to PyPI and including it as a dependency in the current repo. @gaurav mentioned it isn't generic enough to become it's own PyPI package, but I don't think it needs to be?

I have also thought about making it an independent PyPI package too, but have wondered if it is too hardcoded towards our task to be of general use? Would other people want to check it out? Could it be generalized to other GSheets to LinkML tasks? I assume we shouldn't work on anything like that generalization for the foreseeable future.

Let's bring this up at DMH/xs-WS meetings next week and see what other people think. At one point, the plan was to phase out the Google Sheet and this script entirely, and to have model builders directly modifying LinkML files. If that's the case, it's probably fine to leave this here and deprecate it once we reach that point. Otherwise, it might be a good idea to separate it into its own repository -- if we make it very clear that the sole aim of this script is currently to construct the CRDCH model in LinkML, that probably gets us out of the "is too hardcoded towards our task to be of general use?" question for now, and then non-CCDH LinkML developers can then generalize it out for other models as needed.

This might be worth it just so we can separate issues relating to "sheet2linkml" from model-related issues in this repository! So that'd be a benefit for sure.

Second proposal is to create a generic lib or vendor folder rather than generators/google-sheets? The reason I mention this is because this will allow us to create or install other local third party packages to that location, rather than restricting it to generators.

That sounds fine to me! I think I expected there to be additional generators, or maybe I liked the idea of encouraging each LinkML model to have a generators/ directory that contained scripts for building that model from wherever the model was being designed (such as a UML file, say). I'd prefer /vendor to /lib, since I'm used to library files being used by the repository, rather than the other way around.

I love that functionality that @gaurav developed, but it does make me uneasy that you have to cd into its directory to run the code. So I would like to see it at least changed to something that you can call from an arbitrary path, and provide with command line arguments (or a LinkML config file) if necessary.

Yes, this is definitely a good idea!

gaurav commented 3 years ago

PR #96 carries out the two action items I've listed. The only issue remaining is to split off sheet2linkml into its own package -- I propose we open a separate issue for that, and do that later this year after the Sep 30 deadline.