Support programatic addition of author category annotation columns from CSV/Dataframe - Githubissues

cellannotation / cas-tools

Cell Annotation Schema Tools

1 stars 0 forks source link

Support programatic addition of author category annotation columns from CSV/Dataframe #66

Closed dosumis closed 1 month ago

dosumis commented 3 months ago

Background:

One key advantage of maintaining annotations in a spreadsheet is that it can easily be converted to TSV & then used programatically to alter or add content. It is essential that TDT support this functionality if we are to replace spreadsheets. The most basic operation we need to support is the programmatic addition of columns. Typically this would be via a join operation on dataframes.

Spec:

This operation should use CAS as an imported library. Function should take a pandas dataframe as an input (although we could potentially add command line support with CSV input).
- Input: a dataframe a key column key for joining to annotations & one or more new author category columns
- Configuration:
- arg should specify key for joining to annotation table (must be cell_set_accession, or cell label - we may also need to support cluster ID)
- optional arg to specify which columns to add (default is simple table join)
- args optionally specify data types in additional columns.

Function.

Check that all keys in input dataframe match an entry in the specified Key field in CAS & fail if not
For each match add new key:value pairs from the input dataframe.
If no content in cell, make content null/NaN. Warn of any empty cells following loading.