Sage-Bionetworks / schematic

Package for biomedical data model and metadata ingress management
https://schematicpy.readthedocs.io/en/stable/cli_reference.html
MIT License
22 stars 25 forks source link

Documentation of schematic and data curator app installation and usage #362

Closed milen-sage closed 3 years ago

milen-sage commented 3 years ago

Let's keep the user-facing instructions for data ingress (which will be de-HTAN-ed) separate from installation instructions for the data curator app. They have different use cases.

I.e. the documentation that has been built up here on installing and using the data curator app and schematic can have its own doc space.

There are two large content modules in the above doc

Those can be split up. I.e. the schematic specific docs are being developed here here.

To avoid redundancy, the data curator app setup docs can link to the schematic installation instructions, while specifying any needed adjustments to schematic installation/configuration (e.g. the data curator app is typically deployed in a shiny server environment with a specific schema, so the data curator docs can outline what changes would need to be made to the default schematic installation and setup outlined in the schematic docs module).

I'll take a stab at splitting up the large google doc above in two complementary and coherent modules: each in its own google doc. We can then collectively

xdoan commented 3 years ago

So I see that this gdoc works as within DCC general setup guidelines and installation instructions. This is different from the required steps for repo installation in the READMEs of the schematic and data_curator repo. Already in the doc there isn't a differentiation of use-cases for setup (server, personal, shared, etc) so I agree they should be added.

This needs to be integrated with the README instructions. I personally look to the README's for any repo installation before any external docs and the also the doc does not fit my use case so I don't look at it as a "source of truth" though it is useful for installation.

I would like to clarify that we have user facing docs that are quite comprehensive, but we also need installation docs for the frontend and backend repos that match AND setup guidelines for DCC people that want to implement the ingress system. To truly avoid redundancy I would suggest the README's be the repo installation 'single source of truth' which the doc/jekyll for how this fits into the DCC Ingress System links out to so it does not need to be maintained in multiple places. See DCC validator README and how it links to docs: https://github.com/Sage-Bionetworks/dccvalidator

Also interested in what the vision is for Jeykll docs.

milen-sage commented 3 years ago

This needs to be integrated with the README instructions. I personally look to the README's for any repo installation before any external docs and the also the doc does not fit my use case so I don't look at it as a "source of truth" though it is useful for installation.

Yes, agreed most people look at the README of a repo for installation instructions. The integration though can potentially happen in two directions: a) repo README is source of truth; Jekyll documentation modules link to README for installation instructions (as you suggest) b) the Jekyll documentation modules are source of truth; the repo README links to Jekyll module on installation (and other) instructions

Also interested in what the vision is for Jeykll docs.

The repo README is a bit limiting in terms of modularizing and formatting docs. We don't have to use Jekyll, in general, but it is a tool we have been working with already, providing various templates and ways to organize content that is then amenable to incorporate in other content management system developed by Stacey.

So I see that this gdoc works as within DCC general setup guidelines and installation instructions. This is different from the required steps for repo installation in the READMEs of the schematic and data_curator repo. Already in the doc there isn't a differentiation of use-cases for setup (server, personal, shared, etc) so I agree they should be added.

Yes, agreed the gdocs content needs more work. E.g. in terms of adding sections corresponding to use cases for schematic and data curator app deployment. And that would inform how we structure the documentation modules when translating the gdocs to actual online/live documentation. That's another reason option b) above would be more flexible: if all different deployment use cases and various related instructions are laid out in a repo README file it might get unwieldy and hard to refer to from other documentation sources (e.g. SOPs). Also, it would be hard to customize for different projects (short of gh branches).

To truly avoid redundancy I would suggest the README's be the repo installation 'single source of truth' which the doc/jekyll for how this fits into the DCC Ingress System links out to so it does not need to be maintained in multiple places. See DCC validator README and how it links to docs: https://github.com/Sage-Bionetworks/dccvalidator

Yes, certainly that's one direction: (a) above.

@xdoan do you know if this section on installation in pkgdown pages here

https://sage-bionetworks.github.io/dccvalidator/index.html

automatically reflect updates to the README here https://github.com/Sage-Bionetworks/dccvalidator?

For an example of direction b) above, https://github.com/Sage-Bionetworks/dccvalidator points to pkgdown doc article on customizing dccvalidator: https://sage-bionetworks.github.io/dccvalidator/articles/customizing-dccvalidator.html

I would like to clarify that we have user facing docs that are quite comprehensive, but we also need installation docs for the frontend and backend repos that match AND setup guidelines for DCC people that want to implement the ingress system.

Yes, the docs for frontend and backend should be complementary and coherent; and separate from data contributor user-facing docs; and separate from the guidelines for DCC administrators that want to implement the ingress system (i.e. other projects w/in Sage/CompOnc for now).

For me, both a) and b) work to implement that (either in pkgdown, Jekyll, README markdown and/or other doc tool) as long as we get to single source of truth modules that can be referred to from different SOPs, e.g. wrt different use cases. The main work is getting the content ready and that can happen in gdocs. Leaning towards Jekyll since we've used it, but can coordinate on what to use when content is ready.

sujaypatil96 commented 3 years ago

Updated a few README's in the schematic project repo which you can look at here: https://github.com/Sage-Bionetworks/schematic/tree/develop-main-README.

ychae commented 3 years ago

@sujaypatil96 can this issue be closed with the additions that you've made?

milen-sage commented 3 years ago

This needs more work (next workstream milestone).

The two modules referred to above now exist on confluence; w/ some of the documentation covered by @sujaypatil96 's github package work.

We need to clean up and reorganize the confluence docs; make them more maintainable and complete.

sujaypatil96 commented 3 years ago

I think this issue can be closed, given our documentation in the repo README and the RTD docs.