Closed allyhawkins closed 1 year ago
Now that we are incorporating the sample level metadata as part of the unfiltered SCE objects, this data will already be present in the AnnData.uns
slot. We will need to take the list of metadata present and use it to populate the cell metadata columns that are required by CZI.
Before I start to implement these changes, I wanted to outline my ideas for implementation. Right now, the SCE objects each have sample_metadata
stored in the SCE object as a single-row data frame. The contents of the sample_metadata
need to be stored as columns in the cell-level metadata in the AnnData object. Additionally, we need to pull out the 10X kit and suspension type from the library_metadata
to add to the cell-level metadata.
I think that means we need to add two steps to the script that converts SCE objects to AnnData. So the new sce_to_anndata.R
would have the following outline:
scpcaTools
or we could just keep this within the script run in the workflow)
assay
and suspension_type
scpcaTools
)
sample_metadata
and add them as new columns in colData
We also need to consider how we want the CITE-seq or cell hashing datasets to look. Do we want to also add the same contents to the cell-level metadata for those objects? I am thinking we want the contents of those files to mirror the RNA files so we will need to take the same steps for both RNA and feature objects.
Tagging @jashapiro for any feedback on this approach.
All of the items in the checklist have been added so I'm going to close this.
To make our
AnnData
output as compliant as possible with CZI, we need to update the existing cell metadata present in theAnnData
objects to include the necessary entries. The full documentation can be found here: https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/3.0.0/schema.md#obs-cell-metadataIn brief we will need to add:
We also need to include cell type ontology, but that is a separate addition we are tracking.
One thing to think about as we are doing this, is if we want to add in these items to the SCE objects before conversion or if we want to add them into the AnnData after conversion. I do think when we generate merged SCE objects some of these things we may want (like disease, tissue, sex) to be in our SCE objects. So maybe instead of including a process to do that in AnnData we just apply it to SCE objects.