datagrok-ai / public

Public package repository for the Datagrok.ai platform
MIT License
42 stars 26 forks source link

#2627: RFC: Sticky Meta #2627

Open skalkin opened 7 months ago

skalkin commented 7 months ago

This document outlines the rationale, requirements, and proposed technical implementation of the “DB Properties” feature.

Context and Goals

Datagrok has a concept of entities. Among other features, it enables each internal object (dataset, connection, query, function, user, group, etc) to be associated with the arbitrary metadata (at this moment the UI only supports strings). The goal of this initiative is to improve the existing functionality (more types, better UI, powerful search) and extend this functionality to objects that are stored externally. This would enable end users to quickly build lightweight solutions (with no coding required) to a certain kind of problems that are traditionally solved by the IT in a resource-intensive way (extending existing databases, providing the APIs, building the apps for the end users, maintaining it, etc).

Use case 1: Peter, the head of cheminformatics at BostonMol, wants to run a hit triage campaign. He wants BostonMol’s chemists to be able to provide “good/neutral/bad” ratings to any of the molecules in the proprietary database. Also, no matter where the dataset came from (db / file / web service), we want to be able to see a molecule's rating in the context panel, and retrieve the corresponding ratings for all molecules in the dataset. To do that, Peter, through the UI, creates a schema {name: “Hit Triage”, appliesTo: {semType: “BMol”}} with the following property: { name: “Rating”, choices: [“good”, “neutral”, “bad”] }. Since molecule ids are detected as the “BMol” semtype, this is enough for Datagrok to put everything together.

Use case 2: Peter runs a database query and wants to annotate rows in such a way that these annotations would be visible next time he or anyone else runs this query, or opens a dataset with the similar structure. From the results view, he clicks Data | Annotate…, and either selects an already existing schema, or creates a new one, creating the desired properties as well. Then, he chooses key columns along with the way they should be matched (multiple checkboxes: by name, by tag values). Once he clicks OK, corresponding columns are created, and the values are editable in the grid.

Implementation

Database representation

We would reuse and extend existing powerful implementations of entities, properties, dynamic inputs, and semantic types.

Property table defines metadata (not value!) associated with a property. It would be mapped to the existing class Property. Here are some of the fields:

properties table

id: uid
schema_id: uid
name: string
caption: string  // name to show in the UI
category: string  // for automatically generated UI
description: string  // shown in the tooltip, etc   
propertyType: string  // int, float, bool, string, datetime, list
semType: string
choices: list<string> // controlled dictionary of allowed values
format: string
min: float
max: float
validators: list<string>  // GrokScript expressions

The actual values would reside in the property_values table (plus a record in the entities table):

property_values tables

entity_id: uid
property_id: uid
value: dynamic

Often, more than one field is required. To address that, each property is associated with the schema (for developers, think of schemas as classes, and properties as fields). Schemas support inheritance that allows us to reuse common properties:

schemas table

id: uid
parent_schema_id: uid
name: string
appliesTo: json  // a set of column tags. Evaluated with the AND operator. 
                          // Example: { semType: “BMol”, campaign: “Hit Triage”}
entity_types table
id: string
name: string
description: string

entity_types_schemas table

entity_type_id: uid
schema_id: uid

Schema itself is an entity. The privileges (View / Edit) associated with it define whether the user can view the properties, or edit them.

Example: Mary is giving the “good” ratting for the “ccc” BMol molecule and “bad” rating for the “c” molecule. It would go to the following tables:

entity_types

Id: 0, name: “semType=BMol”
schemas
id: 1, name: “Mol Rating”
entity_types_schemas
entity_type_id: 0, schema_id: 1
properties
id: 2, schema_id:1, property_type: “string”, semType: “rating”, name: “rating”, choices: [“good”, “neutral”, “bad”]
id: 20, schema_id:1, name: “chemist”, propertyType: “string”
entities
id: 3, entity_type_id: 0, key: “ccc”
entities
id: 31, entity_type_id: 0, key: “c”
property_values
id: 40, entity_id: 3, property_id: 2, value: “good”
property_values
id: 41, entity_id: 3, property_id: 20, value: “Mary”
property_values
id: 42, entity_id: 31, property_id: 2, value: “bad”
property_values
id: 43, entity_id: 31, property_id: 20, value: “Mary”

JS API

In addition to self-managing, we need the ability to programmatically create, edit, delete, and retrieve properties and their values. For simplicity, JS API should work on the schema level. Schema class should be created:

class Schema { id; name; parentSchema; appliesTo }

grok.data.props.getSchemas(): Schema[]
grok.data.props.setValue(type: string, id: string, property: Property, value: dynamic)
grok.data.props.setAllValues(schema, keys: Column[], values: DataFrame)
grok.data.props.getAllValues(schema: Schema, keys: Column): DataFrame
grok.data.props.saveSchema(schema: Schema)
grok.data.props.deleteSchema(id: uid)

UI

Self-service UI will be provided for creating properties/schemas, associating them with

Context panel

If a cell has custom properties, they are shown in the context panel. Values might be editable as well.

Batch retrieval

If a column has associated schemas, a “Get properties” action is shown in the column context panel. If multiple schemas are associated, a dialog with the checkboxes shows up. This action adds a column for each property, and fills it with the retrieved values (see JS API: getAllValues). Depending on the privileges and settings, values could be edited right in the grid.

In-grid editing

If a grid already has db property columns, in-grid editing might be available, subject to privileges and settings. Two modes of editing could be supported: automatic sync whenever you change a value, or the “SAVE” button for batch saving that would appear on the ribbon panel when you edit a db property.

Tooltips and cell augmentation

Grid cells that have db properties will be rendered with a small triangle in the right top corner. Db properties will be shown in the tooltip as well.

Persistence

To avoid confusion, the values in the columns containing db properties should not be serialized as part of the project; instead, they should be re-queried each time a project is open.

Open Questions & Future Roadmap

Audit

If necessary, the history of modification will be provided using the built-in audit capability. This feature fits well with the proposed design and can be added after the MVP.

Multiple Key Columns

Cell-based db properties are quite powerful, simple and intuitive, but sometimes entity key is contained in multiple columns (example: entity type is “plate well”, key columns: “plate barcode” and “well position”. In this case, the entity type notation would follow special syntax like “[semType:plate barcode, semType: well position]”. This complicates both the UI and entity resolution a bit, but still fits with the proposed design, so it can be added after the MVP.

Entity Key Resolver

Sometimes, the natural key is not contained in the cell itself, but could be inferred from it (perhaps using other columns as well). An example is using canonical smiles as a key for molecules. To address this, we will use special “entity resolver” functions. This feature fits well with the proposed design and can be added after the MVP.

//input: list ids
//output: list keys
//meta.sourceSemType: BMol
//meta.targetEntityKey: BMol
keys = ids.map((id) => convert(id));
Special Property Types

Some properties can be quite universally useful, so we need to make them available in advance. In particular, File attachment, 5-star rating and a discussion. These will require a special UI:

tcarolan commented 7 months ago

This is going to be a very powerful feature. Most end-product project dashboards are not straight DB tables but constructed DB views. This feature allows to tag entities across multiple complex dashboards and share tags across users. Thanks Andrew, for the thorough requirements definition.

Also, I like the idea of some standard but complicated types: files, stars, discussion. Great idea. May I add one? advanced text editor called 'markup'. The user would receive an input dialog similar to the Github text box I am writing into with Bold/Italic/Bullets/etc.

skalkin commented 7 months ago

@tcarolan - I like the idea regarding the markdown editor! @LaykDimon - let's implement it as a JS-based "markdown" string input; it will be picked up automatically if the property is annotated with it. Look like there are plenty of small libraries like this one that you can use - https://github.com/sparksuite/simplemde-markdown-editor.

dnillovna commented 6 months ago

This issue has been mirrored in Jira: https://reddata.atlassian.net/browse/GROK-14900

LaykDimon commented 6 months ago

added markdown input

markdown-input-demo

dnillovna commented 2 months ago

This issue has been mirrored in Jira: https://reddata.atlassian.net/browse/GROK-16160