VariantEffect / mavedb-api

MaveDB API
GNU Affero General Public License v3.0
8 stars 2 forks source link

Keyword overhaul #40

Closed afrubin closed 3 weeks ago

afrubin commented 3 years ago

Rationale

MaveDB currently uses a single keyword field that can be populated with user-specified keywords. However, many of the entries are blank or have keywords of little utility (e.g. target gene name). In order to improve MaveDB’s searchability and utility for modellers, we will replace the existing optional keyword system with a new system.

Instead of one keyword field with multiple keywords, each score set or experiment will have fixed keyword categories that each take a single option. Instead of free-text keywords specified by the user, the user will select from a controlled vocabulary of terms.

Implementation

The existing keyword field will be deprecated and no longer included in the forms or views, but the keyword data will be retained (at least for now).

New keyword fields will be created (exact names and fields to be finalized later):

All but the last category are specific to experiments and the last is specific to score sets. However, to preserve future flexibility, these should be implemented at the DatasetModel class level rather than in the specific subclasses.

Although the forms will only support one keyword per entry initially, these should be implemented such that each DatasetModel can have many keywords.

Keywords should be defined in an editable JSON file, similar to the way that target reference genomes work now (see the ReferenceGenome class and its usage).

afrubin commented 3 years ago

The first draft of variant keywords will be based on the MaveReferences table: https://github.com/varianteffect/mavereferences

afrubin commented 1 year ago

We now have a set of keywords and categories defined by AVE ETS. This work is being prepared for publication so it's timely to move forward with implementing this feature.

EstelleDa commented 5 months ago

New keywords list is from https://github.com/ave-dcd/mave_vocabulary/blob/main/schema/experiment.yml

Four main keyword categories: Endogenous Locus Library Method In Vitro Construct Library Method Variant Library Phenotypic Assay

EstelleDa commented 5 months ago

I think we can get rid of the homepage keywords section until we decide what to do with it. The old keywords should not appear in the API and UI, but we should keep them in the DB for now. - Alan's comments.