OHDSI / ROhdsiWebApi

An R package for interfacing with a WebAPI instance
https://ohdsi.github.io/ROhdsiWebApi
10 stars 17 forks source link

Get and Set CohortDefinitions #60

Closed PRijnbeek closed 4 years ago

PRijnbeek commented 4 years ago

Not sure this is already in a Roadmap but a handy feature would be if i could export a list of cohortDefinitions from public Atlas and import them in my own Atlas.

To do this now I am exporting them in Atlas or I can use the functionality that Martijn made earlier that we use to make study packages, but importing is a manual step.

If this functionality is already available in someway in the code base please let me know.

Thanks

gowthamrao commented 4 years ago

Hi @PRijnbeek - Yes, it already exists, but not fully tested - so you could help us! We worked on it recently.

To achieve your goal, you can follow these steps:

  1. Get the list of atlas definitions from the source atlas webapi into a dataframe using this function: getAtlasDefinitionsDetails https://github.com/OHDSI/ROhdsiWebApi/issues/40 Then filter them locally in R to select the cohort definitions that you would like to extract expressions for. This will give you a list of cohortIds - you can obviously skip this step if you already have the list cohortIds.
  2. For each of the cohortIds get the cohort definition expressions from source atlas-webapi using this function getCohortDefinition. Note these will be R-objects (not JSON) as described here in our generic design pattern. You will have to convert it into JSON as in this example.
  3. You can then post them to your target atlas-webapi using postAtlasDefinition here. Note: postAtlasDefinition has not been merged yet.

i made these contributions based on the experience in our Covid collaboration - where we were copying definitions from one atlas instance to another. Using this approach, we can achieve this using a script and also introduce error management/quality controls. A enhancement would be to also compare JSON expressions to ensure we are not duplicating cohort definitions or accidentally making changes (to do).

Also note: i wrote the getAtlasDefinitionsDetails and postAtlasDefinition to be a generic function names for all atlas functions. I like this approach, compared to having many functions such as 'getCohortDefinition', 'getConceptSetDefinition', getIncidenceRateDefinition. I wonder if we should deprecate getCohortDefinition to support getAtlasDefinition.. (@schuemie @alondhe) to reduce the number of functions in the package.

vojtechhuser commented 4 years ago

+1 on this problem

PRijnbeek commented 4 years ago

If I do this I get the json:

cohortId <- 1773955 baseUrl <- "http://api.ohdsi.org/WebAPI/" url <- paste0(baseUrl, "cohortdefinition/", cohortId) json <- httr::GET(url) json <- httr::content(json)

If I do this I get a 404. cohortId <- 1773955 baseUrl <- "http://api.ohdsi.org/WebAPI/" validJsonExpression <- getCohortDefinition(baseUrl = baseUrl, cohortId = cohortId)

Why is that?

gowthamrao commented 4 years ago

Hi @PRijnbeek , i think that is a design decision. Based on discussion with @schuemie here https://github.com/OHDSI/ROhdsiWebApi/issues/37 and here a decision was made that functions would return objects that are maybe inspected in R (such as dataframe) vs. Json.

Based on this, getCohortDefinition converts the json to data here

You can convert it back to JSON as shown here

schuemie commented 4 years ago

@gowthamrao : How does that explain that the function throws a 404 error when the cohort exists in the WebAPI?

gowthamrao commented 4 years ago

Aah - i totally misread @PRijnbeek question. Let me look into it

gowthamrao commented 4 years ago

The answer is simple the baseUrl has to be in the form "http://server.org:80/WebAPI"

@PRijnbeek has "http://api.ohdsi.org/WebAPI/" i.e. it ends with a slash '/'.

This is causing the url to be "http://api.ohdsi.org/WebAPI//cohortdefinition/1773955"

Causing the 404 error.

@schuemie do we need to introduce error handler in checkBaseUrl?

PRijnbeek commented 4 years ago

Okay thanks, showing the full path would have been helpful in the error message. I think it makes more sense to not have the slash at the end indeed.

schuemie commented 4 years ago

Whether or not the user provides a slash at the end should not matter. We need to add some code to check if it's there, and remove it if it is.

The code in the develop branch would have given a more informative error message (or maybe actually would have just worked). We had better get working on that release ;-)

gowthamrao commented 4 years ago

Sounds like we need to get this done https://github.com/OHDSI/ROhdsiWebApi/issues/52

PRijnbeek commented 4 years ago

I am actually using the develop branch (was not aware i should not) and had this issue

schuemie commented 4 years ago

Hmmm, that is weird. We'll look into it

gowthamrao commented 4 years ago

In master we regular expression for .[checkBaseUrl ] https://github.com/OHDSI/ROhdsiWebApi/blob/8ed2756f727753d3d031405b633c993369c5eb18/R/WebApi.R#L20

In develop we dont have regular expression https://github.com/OHDSI/ROhdsiWebApi/blob/8cbc2b88a858fd705dcb415642f5f8a2b4e2ea34/R/WebApi.R#L20

Is this the reason?

cgreich commented 4 years ago

Friends: I am not following. Atlantes are rarely public. In fact, the only ones public are the one on ohdsi.org, and the ones we pull up for the Tutorials. IQVIA also has one on the public internet. But other than that - didn't we want to have Arachne be the entity that would direct traffic in and out of a protected institution? Allowing for security and governance?

Obviously, cohort definitions are harmless, but everything else in Atlas is highly protected.

gowthamrao commented 4 years ago

Atlantes are rarely public. In fact, the only ones public are the one on ohdsi.org, and the ones we pull up for the Tutorials.

@cgreich i dont know about Arachne, but from design standpoint - we need to ensure that the content being securely transferred is audit able, it should not be a blackbox, i.e. be in 'text' format and be auditable using simple tools like notepad. Transfer between firewalled institutions should be considered "gates" - and this transfer should be thru alternate secure methods acceptable to those institutions.

ROhdsiWebApi achieves these design principles. We can do the following:

  1. [Institution 1 firewall] - using this package we pull data from atlas into text. This is human readable (if u can read json expression). This may be audited.
  2. [securely transfer to institution 2] - this text is securely transferred. ROhdsiWebApi does not FTP a file, and only works within a firewall. So the text files have to be transferred securely between institutions using some other secure mechanism (gate).
  3. [Institution 2 firewall] - the receiving institutions reads the text data into R and then uses ROhdsiWebApi to post the data into their atlas

Transfer between Atlantes on different networks are not handled by ROhdsiWebApi. i.e. the text files have to be securely transferred between fire-walled network using alternate methods