Design repository - Githubissues

pavgra commented 6 years ago

Background

Today OHDSI community produced significant amount of concept sets, cohort definitions and designs of different analyses types, but capabilities for sharing of those entities are very limited and dispersed: some studies are shared via https://github.com/OHDSI/StudyProtocols, some with https://github.com/OHDSI/StudyProtocolSandbox, and there is no way to publicly exchange with concept sets, cohorts, trained prediction models, etc. Not only this forces people to re-create the entities in each environment themselves (so duplicative work), the designs cannot be widely tested and validated by the community to develop best approaches for typical cases. Therefore, there is a proposal to build common assets repository based on Athena.

Musen M.A., Middleton B., Greenes R.A. (2014) Clinical Decision-Support Systems. In: Shortliffe E., Cimino J. (eds) Biomedical Informatics. Springer, London:

There is no established mechanism for accessing reliable, vetted libraries of best-practice knowledge in computational form that are relevant to particular clinical problem areas—for example, management of diabetes. It generally falls on each health care organization, user group, or other entity to undertake its own process of identifying and managing the best-practice knowledge it wants to deploy in its CDS systems. Even having a national or international repository of such knowledge would not preclude the need for customization, but it would certainly make it easier for each health care entity to start with a trusted source. Where such a repository should be hosted, how it might integrate public and private knowledge sources, who would have oversight over it, how knowledge would be peer reviewed and quality-rated, and how it would be sustained are among the many questions that have not yet been answered. As a consequence, health care organizations continue to perform this kind of knowledge-curation work for their own constituencies, and pilot projects often have no clear pathway to becoming operational, sustainable activities.

Use cases

Submit Design for publishing

A user should be able to login, go to assets submission page, select whether he / she is submitting a new design or updated version of existing one, paste design and related info, and submit the asset for publishing. The system should send request for repository moderators to review the design.
Review and publish design

The repository moderators should be able to review the design and either request changes or approve the asset for publishing.
Search designs

A user should be able to search designs using full-text search, asset type and other common characteristics.

Suggested filters:
- Analysis type: Cohort, Feature Analysis, PLE, PLP, etc
- Domain: as in vocabs section
- Adaptive filters: e.g. if PLP model type is chosen, allow to filter by AUC
Suggested table structure:
- Name, Analysis Type, Domain, Short description, Main characteristics, Author, Created / Updated dates
View asset

A user should be able to view:
- common attributes:
- name
- analysis type
- domain
- description
- required / supported vocabs & CDM version
- required / supported dependencies versions (e.g. CohortMethod package)
- created / updated dates
- asset design in human-friendly form (e.g. cohort design block from Atlas, PLP parameters block) and raw JSON
- type-related attributes:
- known results:
  - for cohorts we may show counts in typical datasets
  - for PLP models we can show their characteristics and most significant covariates in typical datasets
- change history
A user should be able to vote for a design (like / star).
Browse design versions

A user should be able to browse history of asset changes.

Tech details

The assets repository should store assets in standardized way (descriptive fields should be put into standardized headers of exhange message whereas the analysis design should be put into body of the message) and do versioing via git.

Atlas & Arachne integration

Going further, we'll add an ability to import a design from Atlas & Arachne and publish it from Atlas & Arachne.

pavgra commented 6 years ago

@pbr6cornell, @hripcsa, @PRijnbeek, @gowthamrao, would be grateful for your review and ideas!

tagging @gklebanov

gowthamrao commented 6 years ago

Thank you @pavgra . Having a standardized way to represent study protocols that are digitally shareable is great idea.

I wonder how much we can "generalize" it. Point being, what if we want to do effect estimation or predictions with care_site or providers as the units vs person_id?

There is also the feasibility question and standards question. In many cases the first step maybe to test if the target data source has the subjects with attributes the study is looking. E.g. is there enough people of the phenotype in question., does the omop conversion meet Themis standards etc.

pavgra commented 6 years ago

E.g. is there enough people of the phenotype in question

Good point, @gowthamrao . This means that a user should be able to browse and extract sub-entities (nested entities) from complex analyses types - e.g. get cohort definitions built-into the PLE analysis and run them first.

Btw, @gowthamrao, thank you for pointing me to Shiny App for OMOP Query Library. For me, it seems like the core idea for both apps is the same, and would be great to discuss with @PRijnbeek ability to collaborate on it and maybe built-in the queries app into Assets Repo.

PRijnbeek commented 6 years ago

Hi @pavgra,

I like the idea of a repository to share study elements and avoid people building the same things (and make the same errors). A very good example is the phenotype library we really need to move forward. We currently often use the central Atlas for sharing cohorts but we need more functionality such as performance of the phenotypes on different datasets etc. etc. I am a big fan of having version control on these entities as you propose, realising also this is a major change. The question is do we need to add functionality in Atlas or in another tool like we have for the vocabularies now.

The Study Protocol example you give is for me more about how we organise sharing of studies with the community. For this is something different and a level above this library probably. It is important to think about the boundaries of the tool you are proposing.

I am missing the point on "standardized way of study protocols that are digitally sharable" and prediction on care site versus person you mentioned @gowthamrao in this context. However, I suggest we do not overcomplicate the tool from the beginning too much.

@pavgra yes we are working on the Query Library as a shiny app. The purpose is to use it during training and avoid (like for the entities you mentioned) that many people have to invent the wheel of making a query to extract all ingredients of a drug etc. The idea here is to provide examples of sql snippets only as an R package people could either install themselves or we make a public version with "approved" sql calls.

As with prediction i like the prototyping we do in Shiny a lot because it is a quick way to start the thinking process and it does not require a big effort either (QueryLibrary was build in a couple of hours). That does not mean it is the ultimate solution of course.

Finally, there might be some good reasons to move "all" knowledge/entities in one tool but maybe also to not mix up too many things in one. I am for example struggling with the idea of having full studies in the repository we are now discussing. My initial thought is that it can hold components to be used in studies (Concept Sets, Cohort Def, Phenotype performance as feasibility etc) and we have another solution for studies, but happen to be proven wrong here. The challenge is to find a good balance between what is in scope and what is not!

Peter

pavgra commented 6 years ago

@PRijnbeek, thank you for your feedback

The question is do we need to add functionality in Atlas or in another tool like we have for the vocabularies now.

Athena seems to be the right place for it - this is tool for distribution, while Atlas if for design

For this is something different and a level above this library probably. I am for example struggling with the idea of having full studies in the repository

I would not agree here, because today (since Atlas 2.6.0) we have PLE / PLP specs (JSONs) which fully describe study and which are used by Hydra to generate study code. The only thing to finish there is to standardize those definitions (document JSON structure). But all in all, why not to store the study definitions? (so I am not talking about storing folders with R files, resources, etc; only self-contained definitions)

That does not mean it is the ultimate solution of course.

So what do you think of storing these typical queries in the tool also?

gowthamrao commented 6 years ago

@pavgra

It would be great if we are able to support an easy way to push/pull designs from one instance to another in batch mode i.e. bulk load.

During that process, duplication of definitions should be checked, so that definitions duplications may be reduced.

gklebanov commented 6 years ago

Dear Friends,

As a community, from an architecture perspective, we need to make sure we position different tools, clearly outline their use cases and stick to that vision over time - instead of continuing to creating new tools. With this in mind, I believe that the proposal to use ATHENA as a global shared repository for other non-vocab use cases is very valid:

ATLAS - study design and single site analysis execution
ARACHNE - local and network study execution.
ATHENA - shared repository of various artifacts, including concept sets, cohorts/phenotypes, features, OMOP Standardardized Vocabs and other. Including ways to govern these artifacts, have discussions, ratings, comments, track usage, versioning etc. Today, it is a repo for sharing OMOP standardized vocabs so it is already well positioned to be extended for other use cases.
Yes, we can also build different demo and prototyping and quick design tools that can easily read from that shared repo and showcase the example results.

All OHDSI tools should be integrated:

ARACHNE can read designs from ATLAS and execute across the network
- All OHDSI tools - including ATLAS, ARACHNE - to be able to publish into a shared repository with a button click - one or more artifacts - as well as import from repository and use it.

I agree with @PRijnbeek that we need to draw a ling into what is being published into ATHENA. In mind mind, it is a repository for reusable building blocks.

schuemie commented 6 years ago

Adding my two cents:

I certainly like the idea of having a place to store reusable objects. I wouldn't do it in ATHENA though. ATHENA currently is the place to obtain and browse the Vocabulary, so its purpose is clearly defined. Instead, I'd create something new (that might integrate with ATHENA) that offers the proposed functionality.

Much of the functionality that is being discussed is readily available in GitHub. I imagine study packages especially will be stored, versioned, and distributed via GitHub in the foreseeable future (albeit using a better approach than the current StudyProtocol(Sandbox) repos). Concept sets and cohorts could be stored in GitHub too, but the per-artifact versioning would be problematic, so perhaps a new tool makes sense there.

OHDSI / Athena

Design repository #48

Background

Use cases

Tech details

Atlas & Arachne integration