Closed afrubin closed 3 weeks ago
The first draft of variant keywords will be based on the MaveReferences table: https://github.com/varianteffect/mavereferences
We now have a set of keywords and categories defined by AVE ETS. This work is being prepared for publication so it's timely to move forward with implementing this feature.
New keywords list is from https://github.com/ave-dcd/mave_vocabulary/blob/main/schema/experiment.yml
Four main keyword categories: Endogenous Locus Library Method In Vitro Construct Library Method Variant Library Phenotypic Assay
I think we can get rid of the homepage keywords section until we decide what to do with it. The old keywords should not appear in the API and UI, but we should keep them in the DB for now. - Alan's comments.
Rationale
MaveDB currently uses a single keyword field that can be populated with user-specified keywords. However, many of the entries are blank or have keywords of little utility (e.g. target gene name). In order to improve MaveDB’s searchability and utility for modellers, we will replace the existing optional keyword system with a new system.
Instead of one keyword field with multiple keywords, each score set or experiment will have fixed keyword categories that each take a single option. Instead of free-text keywords specified by the user, the user will select from a controlled vocabulary of terms.
Implementation
The existing keyword field will be deprecated and no longer included in the forms or views, but the keyword data will be retained (at least for now).
New keyword fields will be created (exact names and fields to be finalized later):
All but the last category are specific to experiments and the last is specific to score sets. However, to preserve future flexibility, these should be implemented at the DatasetModel class level rather than in the specific subclasses.
Although the forms will only support one keyword per entry initially, these should be implemented such that each DatasetModel can have many keywords.
Keywords should be defined in an editable JSON file, similar to the way that target reference genomes work now (see the ReferenceGenome class and its usage).