Now that we are incorporating cell type annotation into scpca-nf, we want to do some level of preliminary exploration to evaluate the cell type annotations and create some outlines of plots we may want to include in the cell type section of the QC report for each library.
Right now we have been starting with the Gawad project and have compiled a list of 4 references that we are including when doing annotation of those libraries. We should start with running the cell type annotation workflow on a subset of the Gawad samples (probably the samples we have previously used for integrating), and then use that subset as input into a notebook to start evaluating the annotation. This will probably get expanded into more issues, but to get us started, here are some things that we have discussed that we should consider including in our analysis:
Comparing annotations to a control or negative reference to a different tissue type
Performing a permutation test and scrambling the annotations X times and re-classifying before comparing the original assigned classifications to the distribution of the scrambled annotations.
What happens when we have NA cells? If we have references that have immune cells + fibroblasts or + another cell type do those NA cells get automatically assigned just because they are not immune? This is probably something we can explore if we compare the results across references.
If we identify common marker genes of common cell types (much easier for immune cell types then other cells), are we able to validate SingleR assignment?
Do we see agreement across cell type assignments for multiple references.
We should start first with performing cell type annotation on the listed references for the Gawad project + a negative control reference and then initiate a notebook where we read in the outputs and make a summary plot showing the results across each of the references. Then we should file additional issues to flush out the other ideas and go forward with more analysis once we have set up the initial notebook.
Also I'm filing this here for now, because I think it will be easiest to get up and running working on this in this repo. We already have the R project set up and one of our goals eventually will be to evaluate integration using cell type assignment. If we want to do any additional exploration regarding looking at integrated datasets and cell type assignment, this repo would be set up to do that. If others have strong thoughts of starting a cell type specific repo to keep things separate, I am open to discussing that.
Now that we are incorporating cell type annotation into
scpca-nf
, we want to do some level of preliminary exploration to evaluate the cell type annotations and create some outlines of plots we may want to include in the cell type section of the QC report for each library.Right now we have been starting with the Gawad project and have compiled a list of 4 references that we are including when doing annotation of those libraries. We should start with running the cell type annotation workflow on a subset of the Gawad samples (probably the samples we have previously used for integrating), and then use that subset as input into a notebook to start evaluating the annotation. This will probably get expanded into more issues, but to get us started, here are some things that we have discussed that we should consider including in our analysis:
We should start first with performing cell type annotation on the listed references for the Gawad project + a negative control reference and then initiate a notebook where we read in the outputs and make a summary plot showing the results across each of the references. Then we should file additional issues to flush out the other ideas and go forward with more analysis once we have set up the initial notebook.
Also I'm filing this here for now, because I think it will be easiest to get up and running working on this in this repo. We already have the R project set up and one of our goals eventually will be to evaluate integration using cell type assignment. If we want to do any additional exploration regarding looking at integrated datasets and cell type assignment, this repo would be set up to do that. If others have strong thoughts of starting a cell type specific repo to keep things separate, I am open to discussing that.