Open suhuanhou opened 1 year ago
Thanks for you interest. Running demo is here. For training set construction, you can choose a well annotated dataset according to your research needs, then preprocess it by sc.pp.normalize_total, sc.pp.log1p and sc.pp.highly_variable_genes and save it as an AnnData object.
Thanks for developing the tool for automatic cell type annotation!
I also want to ask about how to prepare the training set. Are the following codes enough for preparation, supposing that train_adata originally contains 35699 cells with 18010 genes: sc.pp.normalize_total(train_adata, target_sum=1e4) sc.pp.log1p(train_adata) sc.pp.highly_variable_genes(train_adata).
Or do I need to filter the train_adata to contain only highly_variable_genes?
And is that ok if my train_adata are already normalized data such as one export from the data layer of Seurat object and I still let it go through the above 3 lines of code?
And do you have any suggestions on how to choose epochs and gmt_path to get better training and prediction results? What value should I pay attention to if I want to assess whether the training is good or not if I don't know the truth cell type for query data? Should I stop increasing epoch number if I see the accu value nearly flattens?
When I tried to train my own reference dataset, I found that the initial accu value is quite low (shown in the following image), is this normal? (train_adata originally contains 35699 cells with 13295 genes, with running the above 3 lines)
Appreciated your reply!
I would also be interested in answers to questions raised above.
I would like to use TOSICA for cell classification. Can you provide specific examples, especially how to construct a training set.