Closed adomingues closed 3 years ago
Hi António,
I haven't come across this issue before. mclust is used in both stage 2 and 3. Do you know in which stage this happens?
Are the data published? Would it be possible to share them with me so I can have a look and try to spot the error? I would only need the target_gem and the list of markers.
Also, just for simplicity, since you are providing a list of markers and you are calculating weights from the target you don't need a reference gem, so could simplify it to
scID_output_zhang <- scid_multiclass(
target_gem = human_zhong_raw,
markers = zhong_markers,
estimate_weights_from_target = TRUE
)
Cheers, Katerina
Hi Katerina,
First of all thank you for looking into this. I was also very surprised because until now, and I tested scID with a few published datasets, I have not encountered any issues. I also ran the command you suggested and the error persists.
Anyway, this is published data so I have put both objects in an Rdata
file which you can find here:
https://owncloud.gwdg.de/index.php/s/y1qMLgjDmaH5MOw
Let me know if the link doesn't work.
Cheers, António
Hi Katerina,
I forgot to answer this question:
mclust is used in both stage 2 and 3. Do you know in which stage this happens?
It seems to be at Stage 3. Stage 2 completes without errors:
Stage 2: Estimate weights of signature genes from target
__ ___________ __ _____________
/ |/ / ____/ / / / / / ___/_ __/
/ /|_/ / / / / / / / /\__ \ / /
/ / / / /___/ /___/ /_/ /___/ // /
/_/ /_/\____/_____/\____//____//_/ version 5.4.6
Type 'citation("mclust")' for citing this R package in publications.
Attaching package: ‘mclust’
The following object is masked from ‘package:purrr’:
map
Progress: 100% Done!Stage 3.1-2: Calculate scores and find matching cells
fitting ..
Quick update @kboufea:
I re-ran scid_multiclass
with CPM normalized counts, and this time mclust worked without an issue:
===========| 100%
Done!Stage 3.3: Resolve multiclass assignments
class(scID_output_zhang)
[1] "list"
head(scID_output_zhang$labels)
GW08_PFC1_sc1 GW08_PFC1_sc2 GW08_PFC1_sc3 GW08_PFC1_sc4
"Astrocytes-2" "Astrocytes-2" "Excitatory_neurons_WEEK-GW8" "Astrocytes-2"
GW08_PFC1_sc5 GW08_PFC1_sc6
"Excitatory_neurons_stage-GW8-12" "Excitatory_neurons_stage-GW8-12"
This is how I created the CPM matrix:
## use CPM
seo_zhong <- NormalizeData(
seo_zhong,
normalization.method = "RC",
scale.factor = 1e6
)
cpm <- GetAssayData(seo_zhong) %>%
as.data.frame()
head(cpm)[, 1:5]
In case you are still interested I have re-uploaded the data, this time including the CPM matrix: https://owncloud.gwdg.de/index.php/s/UmQlNyvCzkb2QSr
As for the cause of the issue, basically my mistake because I was using raw counts in the target_gem
? That said I have used raw counts for other test data and did not have an issue with scid_multiclass
. Could this somehow be related to this comment?
Cheers, António
Hi António,
Sorry, I didn't have time to look into it yet, but I'm glad you resolved it.
target_gem
should be indeed normalised. I have included a function counts_to_cpm
in this package that is a simple wrapper around scater
CPM normalisation function so you can get the normalised target data in a single line of code
cpm <- counts_to_cpm(human_zhong_raw)
Lack of variance in gene expression could cause issues with mclust... I saw one of the signatures consists of a single gene and that could be a problem with raw counts... I will look into it to try catch the error for the future though!
Cheers, Katerina
Hi again,
I was trying to run the scID to identify cluster in a dataset based on a published list of marker genes:
But I am running into this error, which appears to come from
mclust
:Any ideas what is going on?
Cheers, António
Package versions: