DanHanh / scLinear

Creative Commons Attribution 4.0 International
16 stars 1 forks source link

Incorrect Output of adt_predict #4

Open rajuvee opened 1 month ago

rajuvee commented 1 month ago

After following the new installation guide and a number of hours in dependency hell, I got the package installed!!! Next, I converted my feature names in my seurat object to human as adt_predict does not appear to have a species input. However, once doing all of that, my ouput from adt_predict for some reason only has a few genes (134) and I cannot figure out why?

rajuvee commented 1 month ago

My original object has approximately 2k genes which do get correctly converted to human format as far as I can tell. I wanted to add the following code and genes that I am getting:

code: AllCells_clusteringRev5 <- readRDS('AllCells_clusteringRev5.RDS') AllCells_clusteringRev5[['Protein']]<-NULL AllCells_clusteringRev5[['SCT']]<-NULL AllCells_clusteringRev5_toMessWith<-AllCells_clusteringRev5

AllCells_clusteringRev5_toMessWith[["RNA3"]] <- as(object = AllCells_clusteringRev5_toMessWith[["RNA"]], Class = "Assay") DefaultAssay(AllCells_clusteringRev5_toMessWith)<-"RNA3" AllCells_clusteringRev5_toMessWith[["RNA"]]<-NULL

Converting Mouse to Human Gene Names

convert_symbols_by_species <- function(src_genes, src_species) { if (src_species == "human") { dest_species <- "mouse"

  dest_symbols <- src_genes %>%
    tibble::enframe("gene_index", "HGNC.symbol") %>%
    dplyr::left_join(human_to_mouse_homologs, by = "HGNC.symbol") %>%
    dplyr::distinct(HGNC.symbol, .keep_all = TRUE) %>%
    dplyr::mutate(MGI.symbol = dplyr::case_when(
      is.na(MGI.symbol) ~ stringr::str_to_sentence(HGNC.symbol),
      TRUE ~ MGI.symbol
    )) %>%
    dplyr::select(-gene_index) %>%
    identity()
} else if (src_species == "mouse") {
  dest_species <- "human"

  dest_symbols <- src_genes %>%
    tibble::enframe("gene_index", "MGI.symbol") %>%
    dplyr::left_join(human_to_mouse_homologs, by = "MGI.symbol") %>%
    dplyr::distinct(MGI.symbol, .keep_all = TRUE) %>%
    dplyr::mutate(HGNC.symbol = dplyr::case_when(
      is.na(HGNC.symbol) ~ stringr::str_to_upper(MGI.symbol),
      TRUE ~ HGNC.symbol
    )) %>%
    dplyr::select(-gene_index) %>%
    # dplyr::mutate(HGNC.symbol = make.unique(HGNC.symbol)) %>%
    identity()
}

return(make.unique(dest_symbols[[2]]))

}

rownames(AllCells_clusteringRev5_toMessWith@assays[["RNA3"]]@counts)<-convert_symbols_by_species(rownames(AllCells_clusteringRev5_toMessWith),'mouse') rownames(AllCells_clusteringRev5_toMessWith@assays[["RNA3"]]@data)<-convert_symbols_by_species(rownames(AllCells_clusteringRev5_toMessWith),'mouse') rownames(AllCells_clusteringRev5_toMessWith@assays[["RNA3"]]@scale.data)<-convert_symbols_by_species(rownames(AllCells_clusteringRev5_toMessWith),'mouse')

pipe <- create_adt_predictor() pipe <- load_pretrained_model(pipe, model = "all")

AllCells_clusteringRev5@assays["predicted_ADT"]

testOfPredicted_ADT <- adt_predict(pipe = pipe, gexp = as.matrix(AllCells_clusteringRev5_toMessWith@assays[["RNA3"]]@counts), normalize = TRUE)

Genes: [1] "CD86" "CD274" "CD270" "CD155" "CD112" "CD47" "CD48" "CD40" "CD154" "CD52" "CD3" "CD8" "CD56" "CD19"
[15] "CD33" "CD11c" "HLA-A-B-C" "CD45RA" "CD123" "CD7" "CD105" "CD49f" "CD194" "CD4" "CD44" "CD14" "CD16" "CD25"
[29] "CD45RO" "CD279" "TIGIT" "CD20" "CD335" "CD31" "Podoplanin" "CD146" "IgM" "CD5" "CD195" "CD32" "CD196" "CD185"
[43] "CD103" "CD69" "CD62L" "CD161" "CD152" "CD223" "KLRG1" "CD27" "CD107a" "CD95" "CD134" "HLA-DR" "CD1c" "CD11b"
[57] "CD64" "CD141" "CD1d" "CD314" "CD35" "CD57" "CD272" "CD278" "CD58" "CD39" "CX3CR1" "CD24" "CD21" "CD11a"
[71] "CD79b" "CD244" "CD169" "integrinB7" "CD268" "CD42b" "CD54" "CD62P" "CD119" "TCR" "CD192" "CD122" "FceRIa" "CD41"
[85] "CD137" "CD163" "CD83" "CD124" "CD13" "CD2" "CD226" "CD29" "CD303" "CD49b" "CD81" "IgD" "CD18" "CD28"
[99] "CD38" "CD127" "CD45" "CD22" "CD71" "CD26" "CD115" "CD63" "CD304" "CD36" "CD172a" "CD72" "CD158" "CD93"
[113] "CD49a" "CD49d" "CD73" "CD9" "TCRVa7.2" "TCRVd2" "LOX-1" "CD158b" "CD158e1" "CD142" "CD319" "CD352" "CD94" "CD162"
[127] "CD85j" "CD23" "CD328" "HLA-E" "CD82" "CD101" "CD88" "CD224"