Closed morganee261 closed 4 years ago
With an old version of SingleR
, the notebook still runs. It needs to be updated for the Bioconductor version of SingleR
. Can you post the column names of your dat_train
?
thanks for your quick response.
here is my colnames for dat_train : head(dat_train [1:10,1:10]) pseudotime PAN3 SPATA22 TAOK1 MEG3 MGP MMP1 CARTPT CDHR4 S100A8 ctrl1_ips.1 0.08646903 -0.3691676 -0.4746009 -0.6379263 -1.597147 -1.014465 -0.1612485 -0.2450320 -0.2065135 -0.1057044 ctrl1_ips.3 0.00000000 -0.3075292 -0.4785825 -0.6960796 -1.608831 -1.022623 -0.1641750 -0.2478715 -0.2087436 -0.1077782 ctrl1_ips.4 0.94640118 -0.3788888 -0.4762593 -0.7251252 -1.602009 -1.017860 -0.1624653 -0.2462147 -0.2074416 -0.1065675 ctrl1_ips.6 0.00000000 -0.3638948 -0.4734457 -0.6897712 -1.593765 -1.012103 -0.1604027 -0.2442082 -0.2058676 -0.1051038 ctrl1_ips.7 0.00000000 -0.3663764 -0.4756155 -0.7253021 -1.600121 -1.016541 -0.1619926 -0.2457556 -0.2070812 -0.1062324 ctrl1_ips.8 0.00000000 -0.3596309 -0.4731918 -0.7175870 -1.593022 -1.011584 -0.1602170 -0.2440271 -0.2057257 -0.1049718
do you suggest I downgrade my SingleR?
No, this is not related to SingleR
, which was used to annotate cell types based on a reference. The first 10 entries do look fine. Probably this error was caused by gene symbols that starts with a number or those that contain "-", things that make them illegal variable names in R. Converting gene symbols to Ensembl gene IDs should solve this problem since Ensembl gene IDs are also legal R variable names.
Could you please let me know how I should do that ?
thanks again for your help!
You can get Ensembl gene IDs and their corresponding gene symbols with biomaRt
: https://bioconductor.org/packages/release/bioc/html/biomaRt.html
If you have not used biomaRt
before, it can be a bit intimidating. You can also use one of the tr2g
functions in BUSpaRse
to get Ensembl gene IDs and their corresponding gene symbols as well, though you will also get the transcript IDs. You'll see the code chunk calling tr2g_ensembl
earlier in this slingshot
tutorial.
Once you have a data frame with a column for Ensembl gene IDs and another column for gene symbols, say the data frame is called df
, then you can convert gene symbols to Ensembl gene IDs with
colnames(mat) <- df$gene_id[match(colnames(mat), df$gene_symbol]
Also note that since Ensembl is moving their servers, the archives will not be available until April 16, though the current version (99) is available. However, you can still access the older versions of Ensembl from Bioconductor, via AnnotationHub
. See the RNA velocity notebook in this repo for an example.
If you don't want to convert the gene symbols into Ensembl IDs, there's another work around: use make.names
(in base R) to make all the column names legal.
I tried using make.names and it seems to be working now. do you know how long this step usually takes ?
thanks
It depends on how many genes you are using, how many cells there are in your dataset, how many cores you use, and the other parameters for ranger
. It took about a minute or so (didn't formally time it with system.time
but it didn't take too long) in the tutorial, with 3 cores.
Hello,
Thanks for great tutorials, I am currently following your slingshot tutorial and I am running into an error when running the rand_forest line. here is what I get : model <- rand_forest(mtry = 200, trees = 1400, min_n = 15, mode = "regression") %>%
Could you please help with that?
thank you Morgane