Open zhangjl-work opened 2 years ago
Hi, thanks for your interest in our work. Could you please provide more detail to allow us to understand your question more clearly? At a minimum, a reproducible example of the input data, the specific code that you are running and the output would be needed to help further. Thanks
hi,thank you for your reply!
I use your trained file NatGen_Supplementary_table_S4.csv,as input file。 and use my own rds file。the following is my code.
library(Seurat)
sigdat <- read.csv("NatGen_Supplementary_table_S4.csv") temp_allgenes <- c(as.vector(sigdat[,"Basal_SC"]), as.vector(sigdat[,"Her2E_SC"]), as.vector(sigdat[,"LumA_SC"]), as.vector(sigdat[,"LumB_SC"])) temp_allgenes <- unique(temp_allgenes[!temp_allgenes == ""])
allfile <- list.files("/PROJ2/development/zhangjunling/project/breast_cancer/scSubtype/data/", pattern = 'rds') rds_list <- list() for (i in allfile) { print (i) file_path <- paste0("./data/",i) Mydata <- readRDS(file_path)
#Mydata <- readRDS("./B_epi_filter_T.rds")
Mydata <- ScaleData(Mydata, features=temp_allgenes)
***@***.******@***.***)
#calculate mean scsubtype scores
outdat <- matrix(0,
nrow=ncol(sigdat),
ncol=ncol(tocalc),
dimnames=list(colnames(sigdat),
colnames(tocalc)))
for(i in 1:ncol(sigdat)){
row <- as.character(sigdat[,i])
row<-unique(row[row != ""])
genes<-which(rownames(tocalc) %in% row)
temp<-apply(tocalc[genes,],2,function(x){mean(as.numeric(x),na.rm=TRUE)})
outdat[i,]<-as.numeric(temp)
}
final<-outdat[which(rowSums(outdat,na.rm=TRUE)!=0),]
final<-as.data.frame(final)
is.num <- sapply(final, is.numeric);final[is.num] <- lapply(final[is.num], round, 4)
finalm<-as.matrix(final)
##Scaling scores function before calling the highest Call
center_sweep <- function(x, row.w = rep(1, nrow(x))/nrow(x)) {
get_average <- function(v) sum(v * row.w)/sum(row.w)
average <- apply(x, 2, get_average)
sweep(x, 2, average)
}
##Obtaining the highest call
finalmt<-as.data.frame(t(finalm))
#print (head(finalmt))
finalm.sweep.t<-center_sweep(finalmt)
Finalnames<-colnames(finalm.sweep.t)[max.col(finalm.sweep.t,ties.method="first")]
finalm.sweep.t$SCSubtypeCall <- Finalnames
result <- as.data.frame(table(Finalnames))
print (result)
}
Below is my output:
[1] "G_epi_filter_LN1.rds" Centering and scaling data matrix
Finalnames Freq 1 Basal_SC 1184 2 Her2E_SC 1116 3 LumA_SC 1464 4 LumB_SC 1559 [1] "G_epi_filter_LN2.rds" Centering and scaling data matrix
Finalnames Freq 1 Basal_SC 906 2 Her2E_SC 848 3 LumA_SC 1073 4 LumB_SC 1111 [1] "G_epi_filter_T.rds" Centering and scaling data matrix
Finalnames Freq 1 Basal_SC 754 2 Her2E_SC 766 3 LumA_SC 923 4 LumB_SC 936
Seems to split my data equally across the four subtypes,why is that?
2022年6月27日 下午6:54,Daniel Roden @.***> 写道:
Hi, thanks for your interest in our work. Could you please provide more detail to allow us to understand your question more clearly? At a minimum, a reproducible example of the input data, the specific code that you are running and the output would be needed to help further. Thanks
— Reply to this email directly, view it on GitHub https://github.com/Swarbricklab-code/BrCa_cell_atlas/issues/16#issuecomment-1167201758, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP747ND4PRHSX4YJIGDINZDVRGB7HANCNFSM5X6TTC4Q. You are receiving this because you authored the thread.
Hi @zhangjl-work,
You actually have to integrate your data with their training data first. Then you apply the scSubtyping script to the rescaled integrated expression data (your data + their training data).
Applying the scSubtyping script directly to your data alone will result in a random or even distribution of subtype calls.
Thank you very much for your reply, but there is one more thing I don't understand.
Question 1: Do I have to have my own training set? If I don't have my own training set data, can I use the training set data from your article?
Question 2: What is the file format of the training set? The format of the TNBCmerged_Training_SwarbrickInferCNV.txt file, can you take a few lines to see it?
Question 3: What does the SinglecellMolecularSubtypesignaturesonlytumor_SEPTEMBER2019.tx file in Calculatingscoresandplotting.R do? Can you take a few lines and take a look?
Looking forward to your reply, thanks!
2022年6月28日 上午1:08,Matt Regner @.***> 写道:
Hi @zhangjl-work https://github.com/zhangjl-work,
You actually have to integrate your data with their training data first. Then you apply the scSubtyping script to the rescaled integrated expression data (your data + their training data).
Applying the scSubtyping script directly to your data alone will result in a random or even distribution of subtype calls.
— Reply to this email directly, view it on GitHub https://github.com/Swarbricklab-code/BrCa_cell_atlas/issues/16#issuecomment-1167585655, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP747NCKZO4XTUB4CTWGPATVRHNXXANCNFSM5X6TTC4Q. You are receiving this because you were mentioned.
Hi @zhangjl-work,
Question 1: Yes, use the training set data from the article.
Question 2: The format of the training set should be four Seurat objects (saved as RDS files). There should be one rds file for each subtype (LumA, LumB, Her2, and Basal). I am not familiar with the file TNBCmerged_Training_SwarbrickInferCNV.txt, perhaps @dlroden can clarify.
Question 3: I think SinglecellMolecularSubtypesignaturesonlytumor_SEPTEMBER2019.txt holds the gene signatures for each molecular subtype and Calculatingscoresandplotting.R computes the signature enrichment scores for these gene signatures.
I hope @dlroden and @sunnyzwu can clarify these.
Thank you very much for your reply, I will communicate with the person you recommended! Thanks again!
Hi @zhangjl-work, Is your problem solved? I also want to know the format of the input data for scSubtype and where to get the training set file.
Looking forward to your reply!