Closed justinesjw closed 5 years ago
Hi Justine,
Sorry for the confusion. The number of candidate cells from check_markers versus the actual training are different because check_markers uses heuristics to guess at the number. Some details: in order to identify training cells, Garnett calculates an aggregate marker score for each of the cells - i.e. a score based on all of the genes per cell type. It then chooses cells in the 75th percentile or above for aggregate score in only 1 cell type. check_markers does the first part of this - calculates the aggregate marker score - both with and without each gene as a heuristic measurement of the effect of including each gene in the marker file, but does not do the final step of choosing cells that have uniquely high marker scores.
Often when cell types are very similar, there are a lot of cells that have high aggregate scores for multiple cell types, so they don't get chosen for training, but do look like candidates in the check_markers plot. One potential solution if you believe that there should be more good cell candidates for your lower populations (HEP_4 and HEP_7) is to see if you can find any additional marker genes to add for them. This will sometimes bump up the aggregate score for one cell type high enough to be included.
Hope this helps, if you have further question, reopen!
Hi!
Thanks for developing Garnett. It has been a great help to my research.
While using Garnett, I have been facing this error
I understand that this is caused by low cell count and it was easily fixable by merging some groups that have similar DE genes. However, looking at my marker ambiguity plot, number of cells captured by the marker list is more that those that passed the training neither do they have overlapping genes with other subclusters.
I was wondering if you can help me understand more about this problem?
Any help will be much appreciated.
Thanks, Justine