Sage-Bionetworks / dccvalidator

Metadata Validation for Data Coordinating Centers
https://sage-bionetworks.github.io/dccvalidator/
Other
9 stars 11 forks source link

Sync up of dccmonitor and dccvalidator app #524

Closed milen-sage closed 2 years ago

milen-sage commented 2 years ago

@danlu1 as we discussed, short-term, we can start working on resolving the sync up issue between dccvalidator and dccmonitor. It would be great if you could setup some time with @afwillia (as per our email thread) and guide him through

This will resolve apps operation short-term.

@ychae this aligns with the onboarding work we already committed to. Over the next couple of months, we'd like to streamline the backend of the dccvalidator and dccmonitor to avoid the cause of these syncing issues altogether on one hand. And on the other, we'd like to consolidate our codebase so that maintenance of our apps across data ingress is streamlined, while users already operating the workflows supported by dccvalidator and dccmonitor do not experience substantial changes. We can discuss + estimate that in more detail in other github issues and sprint planning.

@ychae as an aside for some reason, I can't add afwillia to the list of assignees; looks like his username is not w/in this repo scope?

ychae commented 2 years ago

@milen-sage it's because he's not part of the Sage Org on GitHub yet. ~I'll file a ticket with IT to get him added.~

okay I thought this was a bit deja vu, and it's because I've already filed the ticket. I'll follow up.

milen-sage commented 2 years ago

Thanks @ychae ! Figured that might have something do with it :)

avanlinden commented 2 years ago

Hey @milen-sage and @ychae, is this in response to this issue with dccmonitor/dccvalidator syncing issues I opened a couple of weeks ago? https://github.com/Sage-Bionetworks/dccmonitor/issues/123

(even keeping bug reporting in sync between the apps is kind of a pain)

ychae commented 2 years ago

@avanlinden I'm not sure but I think @danlu1 or @milen-sage would know.

ychae commented 2 years ago

@milen-sage Anthony's now part of the Sage GitHub org so you should be able to assign him to tickets.

milen-sage commented 2 years ago

@avanlinden yes, this is the issue - thank you fore referencing, and we can keep updating there for dccmonitor specific fixes.

However, we'd also like to tackle this sync problem a bit more generally in the context of how the two apps are 1) currently being deployed and used in concert and 2) be able to compare that to the Data Curator app. That way @afwillia can 3) in the future work on both streamlining the configuration of these two apps to minimize steps where they can get out of sync; and eventually consolidate some of their configuration relative to a common way of managing and administering data models and downstream schemas (e.g. an updated version of the process that the Data Curator app currently uses).

We can use this issue (#524) to keep track of 1) - and that will likely take about a month or two. We will open more issues for 2) and 3) - w/ 3) likely requiring 3-4 months of work. Hopefully, we will be able to make progress on #123 over the next 2-3 weeks, depending on holidays.

avanlinden commented 2 years ago

@milen-sage @afwillia Awesome, I am open to anything that reduces the mental load of keeping these two apps functioning and up to date. Tag me if there's anything I can help clarify in the dccvalidator extended universe.

milen-sage commented 2 years ago

@avanlinden yes, will do! There are a few items around 3) where your input will be for sure needed. 3) interacts with how data model schemas are developed and maintained; and what are the assumptions that the dccvalidator and the dccmonitor make about schema formats and data curation workflow that data contributors go through in different consortia. Definitely something will be working on closely with you.

danlu1 commented 2 years ago

@afwillia Xa has been informed to add you to the Jumpcloud and VPN scientists group. So you are able to connect to the Shiny server. Here is a list of reference you need to setup and config the Shiny App.

  1. Introduction to dccvalidator and dccmonitor.
  2. Config
  3. Connect to Shiny App server via VS code
  4. Manage the apps

Here is the data flows chart. We can set up another meeting and discuss after you finishing configuration.

afwillia commented 2 years ago

@danlu1 I don't see the sysbio folder on lastpass. Can you or IT give me access so I can use the service account?

danlu1 commented 2 years ago

@afwillia I still wait on IT's reply on if sharing the lastpass is allowed. You mentioned you went through a similar workflow to get the client_secret for Data Curator app. Have you try it for dccvalidator and dccmonitor.

afwillia commented 2 years ago

@danlu1 got it. I will try using my own credentials. Thanks for confirming that's an option.

ychae commented 2 years ago

@afwillia and/or @danlu1 Would you be able to give me an estimate for how much work is left to resolve this ticket? tyia!

afwillia commented 2 years ago

@ychae I just finally got both apps working and can actually start working on the issue. Unfortunately, I don't have a timeline yet, but I'll keep you posted shortly.

ychae commented 2 years ago

Great, thanks for the update @afwillia! Please let me know if there's anything you need from me.

milen-sage commented 2 years ago

hi @afwillia I updated the issue description, adding some checkboxes - could you confirm if that reflects accurately where we are?

If there are other issues tracking bugs related to the sync-ing issue that @avanlinden and @danlu1 are working on, could we link here?

afwillia commented 2 years ago

@milen-sage that looks right. I'm still a little fuzzy on how dccmonitor is getting the templates for validation. The dccmonitor sync bug is still ongoing https://github.com/Sage-Bionetworks/dccmonitor/issues/123

milen-sage commented 2 years ago

@afwillia would be great to figure out the validation template for dccmonitor. Is it based on a JSONSchema?

avanlinden commented 2 years ago

@afwillia @milen-sage Right now, dccvalidator and dccmonitor use a set of excel templates stored on Synapse for validation. The AD portal metadata templates are here. The config file specifies the synIDs of the templates that should be used for validation for each configuration; e.g. : https://github.com/Sage-Bionetworks/dccvalidator/blob/1bd6d558eec544ae9a3dcf8e73f600c01e6e3947/inst/config.yml#L99

Before Nicole left, we developed JSON schemas representing the information in these excel templates that are in this repo. She also added the functionality for dccvalidator to validate against those JSON schemas (documented in this PR), but we have not transitioned to using JSON schemas AD or PEC yet and still use the excel templates located by synID.

Anthony, you should be able to access that template folder on Synapse but if you can't for some reason let me know.

(accidentally closed this issue when I meant to comment, sorry -- reopened!)

milen-sage commented 2 years ago

Based on a discussion with @afwillia sounds like the sync problem has been resolved and we can close this issue (and Sage-Bionetworks/dccmonitor#123).

Thanks @avanlinden, @danlu1 and @afwillia!

@afwillia will open another issue to keep tabs on how do we use JSONSchema functionality across the dccvalidator and dccmonitor apps, so that it aligns with our broader data model management practices. We can chat about that during the stakeholder meeting @ychae organized.