OHDSI / ROhdsiWebApi

An R package for interfacing with a WebAPI instance
https://ohdsi.github.io/ROhdsiWebApi
10 stars 17 forks source link

Adding capabilities for working with tags #240

Open anthonysena opened 2 years ago

anthonysena commented 2 years ago

As of Atlas/WebAPI v2.10 (https://github.com/OHDSI/WebAPI/issues/1917), we support the ability to tag designs with 1 or more tags. I'm opening this issue to see if we can add some options for working with tags via ROhdsiWebApi and see how to best design this in terms of functions we expose in this package.

As discussed in the WebAPI issue above, there are some new endpoints that are available in WebAPI v2.10 for tagging:

I could then see the following functions in ROhdsiWebApi:

In addition, there is now a tags collection that is exposed when getting the metadata for an asset - for example, getCohortDefinitionsMetaData() will include all tags associated with a cohort. I'm thinking it might be useful to include some helper functions to work with filtering data.frame with the tags collection or wrapper methods to do the filtering directly when querying the metadata. Here is a draft function of what I was thinking in terms of a wrapper:

getCohortDefinitionsByTags <- function(baseUrl, tagsOfInterest) {
  cohortList <- ROhdsiWebApi::getCohortDefinitionsMetaData(baseUrl = baseUrl)

  # Filter the list to those that have tags
  cohortsWithTags <- cohortList %>% 
    rowwise() %>%
    mutate(tagLength = length(cur_data()$tags)) %>%
    filter(tagLength > 0)

  # Create a new data frame with only the cohorts with tags
  cohortListWithTags <- cohortList[cohortList$id %in% cohortsWithTags$id, ]

  # There is probably a cooler way to do this via dplyr/purrr but just to get
  # an idea of how this might work
  tagList <- data.frame()
  for(i in 1:nrow(cohortListWithTags)) {
    cohortId <- cohortListWithTags[i,]$id
    curCohortTagList <- purrr::flatten(cohortListWithTags[i,c("tags")][[1]])
    for (j in 1:length(curCohortTagList)) {
      tagName <- curCohortTagList[[j]]$name
      tagList <- rbind(tagList, data.frame(id = cohortId,
                                           tagName = tagName))
    }
  }

  # Filter the cohorts to the list of tags of interest
  cohortsWithTagsOfInterest <- cohortList[cohortList$id %in% (tagList[tagList$tagName %in% tagsOfInterest,]$id),]
  return(cohortsWithTagsOfInterest)
}

The function above could be altered to take an additional parameter for the "metadata" instead of hitting the endpoint. Then we would just check to see if the data.frame contains a tags columns and if it does, we could use it to perform the filtering or stop if it is not present.

Please consider these as a draft of design ideas and please do add your thoughts/input on how this might work. Thanks!

azimov commented 2 years ago

Thanks @anthonysena this is incredibly useful functionality.

One aspect with the implementation here is that we will need to support older versions of WebAPI, I'm not sure if there is any code that checks the API version for what features are supported.

anthonysena commented 2 years ago

One aspect with the implementation here is that we will need to support older versions of WebAPI, I'm not sure if there is any code that checks the API version for what features are supported.

Agreed - this feels relevant to #178 since the WebAPI/info endpoint has the version number in the response. We could check the version number via a call to the /info endpoint and then allow for a graceful exit if the version # is below a certain supported version. Perhaps this type of metadata would be better captured in a settings file or exposed through WebAPI.