Open skalkin opened 7 months ago
This is going to be a very powerful feature. Most end-product project dashboards are not straight DB tables but constructed DB views. This feature allows to tag entities across multiple complex dashboards and share tags across users. Thanks Andrew, for the thorough requirements definition.
Also, I like the idea of some standard but complicated types: files, stars, discussion. Great idea. May I add one? advanced text editor called 'markup'. The user would receive an input dialog similar to the Github text box I am writing into with Bold/Italic/Bullets/etc.
@tcarolan - I like the idea regarding the markdown editor! @LaykDimon - let's implement it as a JS-based "markdown" string input; it will be picked up automatically if the property is annotated with it. Look like there are plenty of small libraries like this one that you can use - https://github.com/sparksuite/simplemde-markdown-editor.
This issue has been mirrored in Jira: https://reddata.atlassian.net/browse/GROK-14900
added markdown input
This issue has been mirrored in Jira: https://reddata.atlassian.net/browse/GROK-16160
This document outlines the rationale, requirements, and proposed technical implementation of the “DB Properties” feature.
Context and Goals
Datagrok has a concept of entities. Among other features, it enables each internal object (dataset, connection, query, function, user, group, etc) to be associated with the arbitrary metadata (at this moment the UI only supports strings). The goal of this initiative is to improve the existing functionality (more types, better UI, powerful search) and extend this functionality to objects that are stored externally. This would enable end users to quickly build lightweight solutions (with no coding required) to a certain kind of problems that are traditionally solved by the IT in a resource-intensive way (extending existing databases, providing the APIs, building the apps for the end users, maintaining it, etc).
Use case 1: Peter, the head of cheminformatics at BostonMol, wants to run a hit triage campaign. He wants BostonMol’s chemists to be able to provide “good/neutral/bad” ratings to any of the molecules in the proprietary database. Also, no matter where the dataset came from (db / file / web service), we want to be able to see a molecule's rating in the context panel, and retrieve the corresponding ratings for all molecules in the dataset. To do that, Peter, through the UI, creates a schema {name: “Hit Triage”, appliesTo: {semType: “BMol”}} with the following property: { name: “Rating”, choices: [“good”, “neutral”, “bad”] }. Since molecule ids are detected as the “BMol” semtype, this is enough for Datagrok to put everything together.
Use case 2: Peter runs a database query and wants to annotate rows in such a way that these annotations would be visible next time he or anyone else runs this query, or opens a dataset with the similar structure. From the results view, he clicks Data | Annotate…, and either selects an already existing schema, or creates a new one, creating the desired properties as well. Then, he chooses key columns along with the way they should be matched (multiple checkboxes: by name, by tag values). Once he clicks OK, corresponding columns are created, and the values are editable in the grid.
Implementation
Database representation
We would reuse and extend existing powerful implementations of entities, properties, dynamic inputs, and semantic types.
Property table defines metadata (not value!) associated with a property. It would be mapped to the existing class Property. Here are some of the fields:
properties table
The actual values would reside in the property_values table (plus a record in the entities table):
property_values tables
Often, more than one field is required. To address that, each property is associated with the schema (for developers, think of schemas as classes, and properties as fields). Schemas support inheritance that allows us to reuse common properties:
schemas table
entity_types_schemas table
Schema itself is an entity. The privileges (View / Edit) associated with it define whether the user can view the properties, or edit them.
Example: Mary is giving the “good” ratting for the “ccc” BMol molecule and “bad” rating for the “c” molecule. It would go to the following tables:
entity_types
JS API
In addition to self-managing, we need the ability to programmatically create, edit, delete, and retrieve properties and their values. For simplicity, JS API should work on the schema level. Schema class should be created:
class Schema { id; name; parentSchema; appliesTo }
UI
Self-service UI will be provided for creating properties/schemas, associating them with
Context panel
If a cell has custom properties, they are shown in the context panel. Values might be editable as well.
Batch retrieval
If a column has associated schemas, a “Get properties” action is shown in the column context panel. If multiple schemas are associated, a dialog with the checkboxes shows up. This action adds a column for each property, and fills it with the retrieved values (see JS API: getAllValues). Depending on the privileges and settings, values could be edited right in the grid.
In-grid editing
If a grid already has db property columns, in-grid editing might be available, subject to privileges and settings. Two modes of editing could be supported: automatic sync whenever you change a value, or the “SAVE” button for batch saving that would appear on the ribbon panel when you edit a db property.
Tooltips and cell augmentation
Grid cells that have db properties will be rendered with a small triangle in the right top corner. Db properties will be shown in the tooltip as well.
Persistence
To avoid confusion, the values in the columns containing db properties should not be serialized as part of the project; instead, they should be re-queried each time a project is open.
Open Questions & Future Roadmap
Audit
If necessary, the history of modification will be provided using the built-in audit capability. This feature fits well with the proposed design and can be added after the MVP.
Multiple Key Columns
Cell-based db properties are quite powerful, simple and intuitive, but sometimes entity key is contained in multiple columns (example: entity type is “plate well”, key columns: “plate barcode” and “well position”. In this case, the entity type notation would follow special syntax like “[semType:plate barcode, semType: well position]”. This complicates both the UI and entity resolution a bit, but still fits with the proposed design, so it can be added after the MVP.
Entity Key Resolver
Sometimes, the natural key is not contained in the cell itself, but could be inferred from it (perhaps using other columns as well). An example is using canonical smiles as a key for molecules. To address this, we will use special “entity resolver” functions. This feature fits well with the proposed design and can be added after the MVP.
Some properties can be quite universally useful, so we need to make them available in advance. In particular, File attachment, 5-star rating and a discussion. These will require a special UI: