Derived Variable Config Persistence

ryanrdoherty commented 1 year ago

Note: this is not to be confused with "Derived Variable Persistence" i.e. storage of the derived variable data and metadata. This is only to store DV configs in the EDA user database.

Per a derived variable WG meeting a few weeks ago and further discussion with @dmfalke, this is the spec as I understand it to support user creation of derived variable configs on the client which will be retained across sessions and associated with analyses so users can select them as visualization data elements.

(DB DDL, add table) Add new derived_variables table in EDA user DB which will contain the following columns:
1. variable_id (PK): [string] uuid (generated, not hashed) ID for this variable
2. user_id (PK): [integer] ID of owning user
3. dataset_id: [string] dataset ID with which this DV is associated
4. entity_id: [string] entity ID of entity to which this variable is assigned
5. display_name: [string] user-provided display name (editable, will appear in variable tree)
6. description: [clob] user-provided description of this derived variable (editable)
7. provenance: [clob] TBD JSON object similar to analysis provenance telling how this DV came to be
8. function_name: [string] name of derived variable plugin which will generate the variable's data
9. config: [clob] configuration of DV plugin which will generate the variable's data
(RAML) Edit analysis table's descriptor JSON schema to clarify that the derivedVariables property's type is a string[] (containing DV variable IDs). Since these IDs are unique in the system, an entity is not required; callers can look up the entity using the derived_variable table above.
Add service endpoints to: a. Create DV configs. Check here for user access to dataset and return same Forbidden response as when user tries to create an analysis for a study they don't have access to. b. Associate them with analysis instances. This can be done with the existing single analysis PATCH endpoint; add a new optional property derivedVariables that takes string[]. Each derived variable ID inside can only be associated with analyses that match the dataset_id (i.e. study_id column in current analysis table, which contains dataset IDs), and must be owned by the owner of the analysis being updated.
Add patch endpoint to allow users to change display name and description of a derived variable. This name will apply to all analyses where that DV appears.
Edit analysis duplicate and import endpoints to copy over the derived var references from source to copy. In the case of import, the actual DV table row may also need to be duplicated if the importing user does not yet have this DV. The ID of the derived var will not change, but PK will not break since PK is [variableID, userID ].
Add an endpoint that returns all of a user's derived variables, GET /users/{user-id}/derived-variables; returns array of derived variables owned by the user (JSON representation of derived_variables table in 1 above)

That's it for now. There may be more details to be hashed out. This sets us up relatively well for if/when we actually persist the DV data in appDB while serving the current purpose of supporting dynamic DV data (for use in visualizations/computes, but not subsetting).

Foxcapades commented 1 year ago

This can be done with the existing single analysis PATCH endpoint; add a new optional property derivedVariables that takes string[].

Derived variable IDs may already be patched in using the existing patch body without modification. If we are adding a new, separate field for this, how should we handle the case when both fields are passed in?

Foxcapades commented 1 year ago

Derived variable IDs may already be patched in using the existing patch body without modification. If we are adding a new, separate field for this, how should we handle the case when both fields are passed in?

Don't add a new field, use the existing field in the descriptor. Validate the derived variable IDs that are in the PATCH request and then union them with the derived variable IDs already attached to the target analysis.

VEuPathDB / EdaUserService

Derived Variable Config Persistence #28