Lan-lab / SIGNET

4 stars 5 forks source link

Comparison scRNA dataset #8

Open VUbels opened 1 year ago

VUbels commented 1 year ago

Dear Lan-lab,

Thank you so much for providing us with this amazing tool. I've ran (almost) the entire script and am now running the GRN activity scoring with AUCell. Here, the vignette indicates to load the scRNA count data again with associated meta data. I was wondering if this should again be the raw data or the normalized/annotated data considering it requires the meta data which is ordinarily only available after normalization/scaling of the data. My apologies for the novice question but any help is appreciated.

Additionally, the trained dataset, with roughly 18.000 human cells across ~20 cell types, returns a list of 238 genes with copaired list of 11 genes that is passed to the AUcell function. In the vignette it is stated that the number of initialization times is set to 10 by default. Would increasing this hyperparameter increase the eventual available genes to map the intercellular communication? As the current resulting outcome is far smaller than expected. Many thanks for any insights.

*I've just noticed the default QC in the script is <5% MT which is too low for a human dataset. I've changed this setting and am retraining the model in the hopes this will increase the returned genes.

Kind regards

luoqh17 commented 1 year ago

The counts matrix is what AUCell uses to figure out the AUCell score. The AUCell score tells you how active bunches of genes are for each sample based on how they rank. AUCell needs the counts matrix to work. You could also use normalized data to check how active gene groups are with ssGSEA. As for more available genes in results, more times might help but it'll take longer to train the model. I'd suggest tweaking the --n_genes setting instead to use more genes for the rest of your analysis.