A crowdsourcing/expert curation platform for metadata categorization
The Metadata Categorization web application allows expert biological indexers to curate a sample set drawn from NCBI BioSample data. The purpose of the annotation tool is to identify terms in the user submission that can be correlated to controlled terms from the Cell Line Ontology. In the best case, the annotator can identify the correct cell line from submitted data and then additional information about the submitted data such as cell type, tissue, and disease can be populated from the Cell Line Ontology.
This project depends on two Solr cores generated for this purpose which contain the relevant data from BioSample and the Cell Line Ontology. The Solr core generated from BioSample will store the user-submitted data extracted from BioSample as well as the annotations supplied by the curators.
The back-end uses Django, a high-level Python web framework, to handle URL routing, aggregating individual records from BioSample into summary records, as well as reading from and writing to Solr.
The front-end uses the free version of Handsontable, an Excel-like composite spreadsheet component. Handsontable provides built-in editing functionality for tabular data. Dialogs and certain other features use jQuery and jQuery UI.
Summary records, showing on-the-fly aggregations of individual BioSample records that have the same source cell line value.
Individual records for "HEK293T" source cell line, shown upon clicking the "+" button at left in the HEK293T summary record. Note the user-annotated value for "Tissue" field in green. Unannotated (i.e. source) values are displayed in red.
Editing fields at the the summary record level propagates value to all individual records in that group