HealthCatalyst / Fabric.Terminology

Service to provide shared healthcare terminology data
Apache License 2.0
3 stars 3 forks source link

ValueSetCodes property on ValueSetApiModel - Paged Collection? #71

Closed rustyswayne closed 6 years ago

rustyswayne commented 6 years ago

The current ValueSetApiModel has a property ValueSetCodes containing a collection of ALL codes in the value set. This is problematic with value sets containing a large number of codes (especially in responses which return collections of value sets).

This problem stems from our original assumption that the vast majority of value sets would be associated with a few hundred codes and we anticipated/accepted a "pain" threshold in the 2 to 3 thousand code arena.

The MBL team has identified a value set with over 20,000 codes and it turns out that value sets with code counts in the multiple thousands are far more prevalent than our original assumption.

From my perspective there are several options we can consider.

  1. Completely remove the ValueSetApiModel and rely on the ValueSetItemApiModel (the summary model with code counts only) in ALL collection queries and then introduce a new endpoint for querying the codes for individual value sets. @corytak - I think this is what you guys are effectively doing now - no?
  2. Introduce yet another ValueSetItemApiModel Proxy that includes a PagedCollection of value set codes instead of the full on IReadOnlyCollection of codes. Note - this approach will most likely introduce a bit more latency into the "uncached" query results (e.g. the first person in will see longer response times) and we also likely increase the complexity of querying for individual value sets given it may be weird to request a value set and have paging arguments for specifying which codes should be included in the PagedCollection property.
  3. Finally, we could opt to kick the can down the road and focus on continuing to flesh out other aspects of the API. This is the least attractive option, in my opinion, as the ValueSetApiModel is pretty core and breaking changes relating to the model/supporting services will likely have frustrating implications for API client/consumers and the adage "go ugly early" may be a smart thing to consider.

@mikebuck @clintling interested in your two bits as well.

corytak commented 6 years ago

@rustyswayne Yes number 1 what we do now and would have my vote as we move forward. We have a paged collection of value sets and then return the list of codes with the value set identifier. We do not page that list of codes in our MBL endpoint, but I agree that's something we'll want to do here.

Some examples of the large value sets I'm seeing are Trauma, Fracture - Lower Body, Cancer and Surgery value sets.

j2jensen commented 6 years ago

Yeah, number one sounds like a good option. If I'm not mistaken, @rustyswayne, the primary downside is that Population Builder wants to show a handful of sample codes for each of the value sets that it lists, right? So if we go with option 1 then PB has to have the additional complexity of performing a separate request to get code samples for each of the ValueSets it wants to display?

Given the added latency you mention with option 2, I think perhaps we can go with option 1, and have the UI populate the code samples asynchronously so that users can keep doing things with the UI while the sample code lists haven't populated yet.

rustyswayne commented 6 years ago

@j2jensen Actually the sample codes have already been removed from the Population Builder and we are displaying counts the same way MBL does. If you remember, that created the code "count" issue we experience due to the fact that PB is actually using partial value sets (ex ICD9, ICD10 not SNOMED for diagnosis). That too has been solved, the solution for which is exhibited in the API "summary" models already.

Selecting option one will however require PB to deal with paging value set codes (which are nested) when the folder is clicked to display the codes. At the moment ALL the codes are displayed - but it's debatable as to whether or not that is actually a usable solution, especially when you consider the very large code sets we are discussing here.

Even though we've already handled the sample codes issue, I think @j2jensen comment relating to latency could also solve certain other issues (like excluding previously selected value set codes and managing paging) be eager loading pages and having the UI sort out whatever it needs to per client application as it's becoming clear that the one size fits all (server side) will likely not be sufficient to accommodate the nuances between the client apps.

I'll scaffold out the ValueSetCode endpoint and get that working and in a second set, remove the ability to query FULL versions of ValueSets in the paged and multiple vs responses. I'm inclined to leave the ability to query the value sets including all codes in single and version queries.