brain-bican / taxonomy-development-tools

Tools to build and edit Cell Annotation Schema taxonomies.
Apache License 2.0
3 stars 1 forks source link

Documentation about tables and basic editing #104

Closed hkir-dev closed 4 months ago

hkir-dev commented 4 months ago

Story: #100

Description

As well as lookup type documentation we should also have workflow docs with screenshots.

dosumis commented 4 months ago

We have some of this already in UI Guide - needs updating.

AvolaAmg commented 4 months ago

Here there is the updated documentation on the tables. The documentation on the basic editing with screenshots will be added on the next comment. The update of the UI interface will be done once this comment is approved.

Taxonomy Development Tools User Interface Guide

Welcome to the Taxonomy Development Tools User Interface Guide. This document is designed to provide comprehensive details on navigating and utilizing the TDT interface efficiently. Whether you are looking to manage data tables, edit information, or leverage advanced features, this guide will assist you in making the most out of TDT.

  1. Tables
    1. Switch system tables 2.User tables
  2. Table Management
    1. Adding new Records
    2. Editing Existing Data
    3. Sorting and Filtering Data
  3. Actions
    1. Save
    2. GitHub Controls
    3. Make a Release
    4. Publish PURL
    5. Export CAS Json
    6. Export to AnnData
  4. Views

Tables

At the heart of the Taxonomy Development Tools is a robust internal database designed to streamline the management and curation of taxonomy-related data. Access to this database is facilitated through a user-friendly interface, with tables being a central component.

To view the available tables, navigate to the Tables dropdown menu at the top of the interface.

Pasted Graphic 3

TDT categorizes tables into two main types, switch system tables and user tables, each serving distinct purposes:

Switch system tables: these tables are essential for the internal configuration of the TDT and cannot be modified by the users.

Pasted Graphic 4

Pasted Graphic 1

-message: this table contains all the messages present one very row of each table.

Pasted Graphic 5

User tables

User tables are created when data is uploaded to the TDT using the load_data operation (https://brain-bican.github.io/taxonomy-development-tools/Curation/). This data is formatted according to the Cell Annotation Schema and organized into multiple interrelated tables.

Example: the nhp_basal_ganglia_taxonomy present an annotation table named AIT115_annotation_sheet from this table a series of user tables are generated and displayed in the TDT.

The user tables are the following:

Exp. AIT115_annotation_sheet.tsv

author name : the name of the first author of the taxonomy. author contact : author's email. author list: name of secondary authors. matrix file ID: a resolvable ID for a cell by gene matrix file. cellannotation schema version: the version of the cell annotation schema. cellannotation timestamp: the time (yyyy-mm-dd) of when the cell annotations are published. cellannotation url: a URL where all cell annotations are published for each dataset.

Pasted Graphic 6

Pasted Graphic 7

cell set accession : an identifier that can be used to consistently refer to the set of cells being annotated, even if the cell_label changes. cell label : the cell annotation provided by the author. cell fullname : the full-length term of the annotated cell set. parent cell set accession : similar to the cell set accession, this is the term for a set of cells on step higher than the cells in the row in the hierarchical classification. labelset : the type of cell annotation from the AnnData/Seurat file. cell ontology term id : the ontology term ID that define the cell type. I has to be the closest term matching the cell label cell ontology term : the ontology term name from the ontology term ID rationale : The short name of the publications used to define the cell ontology term. rationale dois : The DOI of the paper mentioned in the rationale maker gene evidence : List of names of genes whose expression in the cells being annotated is explicitly used as evidence for this cell annotation. Each gene MUST be included in the matrix of the AnnData/Seurat file. synonyms : synonyms of the cell label Supertype : region.info Frequency : Cluster size : The number of cells present in that cluster. Gene counts : The number of genes detected in the cluster. UMI counts : The number of UMI detected in the cluster. AIT21 ABC atlas subclass homology : The homologous term to cell label present in the Allen Brain Cell Atlas. Binary genes : Genes expressed in the cluster. NSForest markers combo : A set of genes obtained using the NS-Forest machine learning algorithm to identify clusters. NSForest F1 score: Curated markers: Comments:

Pasted Graphic 9

AIT115_annotation_sheet_annotation_transfer

dosumis commented 4 months ago
  1. Can you turn this into a PR on https://github.com/brain-bican/taxonomy-development-tools/blob/main/docs/UserInterface.md ? I can then make comments in PR review.
  2. Screenshots will need to be here: https://github.com/brain-bican/taxonomy-development-tools/tree/main/docs/images/screenshots - we can swap them out as TDT evolves. Giving them clear names will help.
  3. You don't need to add indexes manually - the doc system builds them (although editing the index doc to add some description of the contents of various docs would be useful.
  4. These are all user specified fields, not part of the standard but specified in the informal taxonomy for basal ganglion. So - doc doesn't belong here (but it may be useful to document these specifically on the Basal Ganglion taxonomy repo, once that is finalised):

    Supertype : region.info Frequency : Cluster size : The number of cells present in that cluster. Gene counts : The number of genes detected in the cluster. UMI counts : The number of UMI detected in the cluster. AIT21 ABC atlas subclass homology : The homologous term to cell label present in the Allen Brain Cell Atlas. Binary genes : Genes expressed in the cluster. NSForest markers combo : A set of genes obtained using the NS-Forest machine learning algorithm to identify clusters. NSForest F1 score: Curated markers: Comments:

dosumis commented 4 months ago

@hkir-dev - is the doc build automatic or do we need to run it via a make command?

hkir-dev commented 4 months ago

We have a GitHub actions for this: Actions > Publish mkdocs documentation

Action can be triggered manually or it is triggered automatically with the release.