Gilead-BioStats / gsm.app

https://gilead-biostats.github.io/gsm.app/
Apache License 2.0
2 stars 1 forks source link

QC: create `yaml` workflow for generating study-level app #211

Open lauramaxwell opened 3 weeks ago

lauramaxwell commented 3 weeks ago

QC Details

Write out a yaml config file that addresses all requirements for all study-level metadata and specifications.

Additional Comments

lauramaxwell commented 3 weeks ago

@jwildfire feel free to comment on these requirements

jonthegeek commented 3 weeks ago

Also related:

jonthegeek commented 3 weeks ago

I worry that we're falling into the dark side of Miles McBain's Patterns and anti-patterns of data analysis reuse. We shouldn't invent a YAML language that allows for every possibility for data ingestion. At that point we're inventing a programming language, and people using the app already know (some amount of) R.

That said, we have strict expectations for some of the data. We need a way to enforce those requirements, without reinventing too many wheels.

My main plan has been to accept functions for most of the inputs, with stipulations on what sorts of inputs and outputs those functions should accept. That way, if the user wants to load the data from an S3 bucket, they can do so; if they want to load the data from a database, they can also do that. I don't think we need to separately code expectations for each possibility.

Even for the "main" study info on the Study Overview tab, users might want to be able to feed that in with a function (so it's fetched on-demand whenever a new Shiny server instance starts up).

I look forward to hearing your thoughts on how this should work!

jonthegeek commented 1 week ago

@lauramaxwell I paused on this while things were in flux around workflows in gsm and gsm.template. Are things stable enough for me to proceed? What's the best thing to look at to attempt to grok the present best practices around what I need to do here?