Open avanlinden opened 2 years ago
Things we need to figure out:
@avanlinden We have a GH action set on synapseAnnotations to register schemas (all synapseAnnotations keys + values) to Synapse.
For:
Automatically registering schemas in Synapse:
use: can we repurpose the register-schemas.R script from synapseAnnotations? when: changes to schemas credentials: service account
Does this step encompass registering the assay-specific schemas to Synapse? I'm still not totally clear on how these registered schemas will be used in practice :)
@danlu1 is going to lead this effort.
There are two cases where the dictionaries or templates will need to be updated:
If anything changes in synapseAnnotations or sysbioDCCjsonschemas, the dictionaries will always need to be updated. If anything changes in synapseAnnotations or sysbioDCCjsonschemas, the templates maybe need to be updated.
In order to not over-engineer this script we could:
forceVersion=FALSE
in the store
functionality to let Synapse determine if the template file has changed. Need to test this.@kelshmo
Does this step encompass registering the assay-specific schemas to Synapse? I'm still not totally clear on how these registered schemas will be used in practice :)
My understanding is that the create_template_from_Syn_schema.py script requires that the template schemas be registered in Synapse in order to run. I believe they also need to be registered in order to be used by dccvalidator, if dccvalidator is configured to use JSON schemas rather than excel templates for validation.
It seems to me that in order to use those two functions, we would need to register schemas in this repo 1) if a new schema is created (i.e. new assay template) and 2) if a schema is changed (i.e. new key added to a template), but NOT in the case where new values are added to existing keys (because those are registered as part of the referenced mini-schemas via synAnn).
In order to not over-engineer this script we could:
- Run create_template_from_Syn_schema.py, let's say, everyday and leverage forceVersion=FALSE in the store functionality to let Synapse determine if the template file has changed. Need to test this.
This seems like a reasonable approach. We don't change templates that often but when we do is unpredictable and we'd want the changes to be available quickly, so this might be the best bet.
You are correct - the template schemas are pulled down from Synapse!
So, Nicole had been the one to register new templates to Synapse. I believe her package schemann
does this. Have you run any of that code @avanlinden?
Also, Tom has a fully functional system for setting up cronjobs! 🥇
As long as the service account doesn't access any PHI, you can utilize the kubernetes system.
@kelshmo I totally forgot about schemann
, I haven't looked into it at all. I think her register-schema.R function is the same as the one that runs in synapseAnnotations, we should definitely use that.
Also this is totally a job for Tom's kubernetes cluster, no PHI in sight!
Things we need to figure out:
- [x] Can a GH action be triggered by something that happens in another repository? If we can't do this, then we might need to rely on a daily cronjob as @avanlinden has suggested. (@danlu1 is working on answering this Q.)
- [x] What are the Sage standards for setting up cronjobs (e.g. can we use Tom's kubernetes clusters)? (@kelshmo will work on answering this Q.)
As this post, repository_dispatch event can be used to trigger workflow executions from one repository to another.
when: on merge to this repo? can you specify a GH action to run on changes that affect only certain folders, e.g. the JSON schema folders and not the other code?
credentials: synapse-service-sysbio-dcc-tasks-01 service account?
implementation: set-up depends on updates to schema_metadata_templates , schema_annotations or config folders. Need to determine if we should invest the time to write more sophisticated logic to check for templates that have changed, or just update them all!
[x] Need to update create_template_from_Syn_schema.py to store output templates. @danlu1 syn$store()
. parentId
will be determined from list of ids in the config file.
[x] Can GH actions run based on updates to repo sub-folders? @danlu1 We might be able to use "paths' keywords https://docs.github.com/en/actions/learn-github-actions/workflow-syntax-for-github-actions#onpushpull_requestpaths
use: can we repurpose the register-schemas.R script from synapseAnnotations?
when: changes to schemas
credentials: service account
implementation: Schema registration should be folded into the same GH actions that run when dictionaries AND/OR templates get updated.
[x] Need to find the right script. Update: this is the register-schemas.R script in the schemann
repo.
[x] Need to add service account synapse-service-sysbio-dcc-tasks-01 to metadata template sub-folders. ** this does not solve the problem of the dccvalidator + dccmonitor configs needing to get updated in the main repo + then apps restarted on the shiny server
Various helpful things:
Hold this issue since IBC team is trying to solve the same problem. Trying to coordinate with them and reopen the issue when a workaround is generated.
Since we do have a GH action to register schema, I will test and see if it works. Then, I will add another GH in sysbioDCCjsonschemas repo to updating dictionary and template once the changes has been merged to master branch.
We need a Github Action to periodically update the AD and PEC metadata dictionaries used by dccvalidator, and update the metadata template files available for data contributors. This will probably necessitate also automating schema registration when new template schemas are created, as I don't think that has happened yet.
Updating dictionaries:
Updating templates:
Automatically registering schemas in Synapse: