Open Tim0thy1 opened 5 years ago
Hi there, Thank you for letting me know.
You ask "How can it be?" Oh, anything is possible. Is this your own data? Is it something I can download and try?
During the classification, the ensemble of classifiers makes a subtype call for each sample, then there's a step where all the calls must be merged. I think there's some randomness there, if a sample is on the border between clusters, it can bounce back and forth. I used the clue R package for that part.
Or it's a bug. I'll check it out. -dave
On Tue, Apr 16, 2019 at 7:43 PM Tim0thy1 notifications@github.com wrote:
Hi,
I detected the immune subtypes using iAtlas Tools and immune subtypes notebook. But the finalResults are different . How can it be?
result of notebook
1 2 3 4 5 6 Call XY1 0.0000000 0.0000000 0.0312500 0.7265625 0.1562500 0.0859375 4 XY2 0.0000000 0.0000000 0.9375000 0.0000000 0.0234375 0.0390625 3 XY3 0.0078125 0.0000000 0.5781250 0.2578125 0.0312500 0.1250000 3 XY4 0.0078125 0.0156250 0.0000000 0.7500000 0.0390625 0.1875000 4 XY5 0.0000000 0.0078125 0.0625000 0.7109375 0.0937500 0.1250000 4 XY6 0.0078125 0.0234375 0.0000000 0.7500000 0.0000000 0.2187500 4 XY7 0.3046875 0.0625000 0.0000000 0.2109375 0.0000000 0.4218750 6 XY8 0.0234375 0.0000000 0.7968750 0.0156250 0.0390625 0.1250000 3 XY9 0.0156250 0.0234375 0.0000000 0.7812500 0.0234375 0.1562500 4 XY10 0.0156250 0.8984375 0.0000000 0.0156250 0.0000000 0.0703125 2 XY11 0.0078125 0.0156250 0.0000000 0.7500000 0.0078125 0.2187500 4 XY12 0.0000000 0.0000000 0.5000000 0.3828125 0.0781250 0.0390625 3 XY13 0.0000000 0.0000000 0.0000000 0.9531250 0.0078125 0.0390625 4 XY14 0.0078125 0.0156250 0.0000000 0.7656250 0.0156250 0.1953125 4 XY15 0.0000000 0.0000000 0.0000000 0.9453125 0.0156250 0.0390625 4 XY16 0.0078125 0.0000000 0.9453125 0.0000000 0.0078125 0.0390625 3 XY17 0.0390625 0.1875000 0.0000000 0.4687500 0.0000000 0.3046875 4 XY18 0.0156250 0.1171875 0.6250000 0.0546875 0.0000000 0.1875000 3 XY19 0.0078125 0.0156250 0.0000000 0.7812500 0.0390625 0.1562500 4 XY20 0.0468750 0.0234375 0.0000000 0.7031250 0.0000000 0.2265625 4
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Gibbsdavidl/Immune-Subtype-Clustering/issues/3, or mute the thread https://github.com/notifications/unsubscribe-auth/AAiSyQOyKUNRDmtEiTLY4ic5DRYHXdcUks5vhopLgaJpZM4c0E_M .
Hi,
No, this is example data. ( https://isb-cgc.shinyapps.io/shiny-iatlas/ )
or
https://github.com/Gibbsdavidl/Immune-Subtype-Clustering/blob/master/ExtraData/ebpp_test1_1to20.tsv
I upload my R script there. I copy that code from there
############################R code#########################
library(ggplot2) library(clue) library(mclust) library(parallel) library(stringr)
tcgaSubset <- read.table('/home/data/ebppSubset.tsv.bz2', header = T, sep = '\t', stringsAsFactors = F)
newDat <- read.table("/home/data/Immune-Subtype-Clustering-master/ExtraData/ebpp_test1_1to20.tsv", sep='\t', header=T, stringsAsFactors = F)
reportedScores <- read.table('~/data/five_signature_mclust_ensemble_results.tsv.gz', sep='\t', header=T, stringsAsFactors = F) rownames(reportedScores) <- reportedScores$AliquotBarcode
load('/home/data/comparative_immuneSigs_geneLists4.rda') source('/home/data/ImmuneSigs68_function.R') source('/home/data/signature_mclust_ensemble.R') load('/home/data/wolf_set_slim1.rda')
zscore.cols2<-function(x){ return((apply(x, 2, function(x) (x - median(na.omit(x)))/sd(na.omit(x))))) }
tcgaSubset <- log2(tcgaSubset + 1)
rownames(newDat) <- newDat[,1] didx <- !duplicated(as.character(rownames(newDat))) dat <- newDat[didx,-1]
data.quantileExpressed <- apply(dat, 2, function(x){quantile(x[x>0], 0.75)}) datnorm <- as.data.frame(t( t(dat) / data.quantileExpressed ) ) * 1000
datlog2 <- log2(datnorm+1)
sharedGenes <- intersect(rownames(tcgaSubset), rownames(datlog2))
newDatSub <- datlog2[sharedGenes,]
tcgaSubsetSub <- tcgaSubset[sharedGenes,]
joinDat <- cbind(newDatSub, tcgaSubsetSub)
newdatSamples <- colnames(newDatSub) dat2idx <- 1:(ncol(newDatSub)-1) tcgaidx <- setdiff( (1: (ncol(joinDat)-1)), dat2idx)
sampleIdx <- c(dat2idx, sample(tcgaidx, size=200, replace = F)) preCombat <- joinDat[,sampleIdx] preCombatMelt <- reshape2::melt(preCombat) preCombatMelt$SampleSource <- ifelse(test = preCombatMelt$variable %in% newdatSamples, yes = "New Data", no="TCGA Data")
library(sva) combatflag <- 0
if (combatflag) {
batch <- c(rep(1,ncol(newDatSub)), rep(2,ncol(tcgaSubsetSub)))
modcombat = model.matrix(~1, data=as.data.frame(t(joinDat)))
combat_edata = ComBat(dat=joinDat, batch=batch, mod=modcombat,
par.prior=TRUE, prior.plots=FALSE, ref.batch = 2)
} else {
combat_edata = joinDat
}
postCombat <- combat_edata[,sampleIdx] postCombatMelt <- reshape2::melt(postCombat) postCombatMelt$SampleSource <- ifelse(test = postCombatMelt$variable %in% newdatSamples, yes = "New Data", no="TCGA Data")
joinMeds <- apply(combat_edata, 1, median, na.rm=T)
join_med_scaled <- sweep(combat_edata,1,joinMeds,'-')
ls() rm(dat, dat2idx, data.quantileExpressed, datlog2, datnorm, didx, newDat, newdatSamples, postCombat, postCombatMelt, preCombat, preCombatMelt, sampleIdx, sharedGenes, tcgaidx, tcgaSubsetSub) gc()
library(pryr)
scores <- ImmuneSigs_function(join_med_scaled, sigs1_2_eg2,sigs12_weighted_means, sigs12_module_weights,sigs1_2_names2,sigs1_2_type2)
idx <- c("LIexpression_score", "CSF1_response", "TGFB_score_21050467", "Module3_IFN_score", "CHANG_CORE_SERUM_RESPONSE_UP") scores <- t(scores[idx,]) zscores <- zscore.cols2(scores)
calls <- consensusEnsemble(mods2, zscores, 4, 128)
maxcalls <- apply(calls$.Data, 1, function(a) which(a == max(a))[1]) names(maxcalls) <- rownames(scores)
sharedIDs <- intersect(reportedScores$AliquotBarcode, rownames(scores)) t1 <-table(Reported=as.numeric(reportedScores[sharedIDs, 'ClusterModel1']), NewCalls=as.numeric(maxcalls[sharedIDs]))
reported <- 1:6 optcalls <- 1:6
for (i in reported) {
##for subtype i, where did most of the samples end up?
j <- which(as.numeric(t1[i,]) == max(as.numeric(t1[i,])))
##rename maxcall j <- i
optcalls[i] <- j
}
alignedCalls <- sapply(maxcalls, function(a) which(a == optcalls)[1])
t2 <-table(Reported=as.numeric(reportedScores[sharedIDs, 'ClusterModel1']), NewCalls=as.numeric(alignedCalls[sharedIDs]))
jdx <- match(table=rownames(scores), x=colnames(newDatSub)) # index to new data scores pcalls <- calls$.Data[jdx,] # get that table rownames(pcalls) <- colnames(newDatSub) # name it from the new data pcalls <- pcalls[,optcalls]
pcalls <- cbind(pcalls, data.frame(Call=alignedCalls[jdx])) # bring in the aligned calls pcalls <- cbind(pcalls, zscores[jdx,]) # and the scores
finalResults <- list(AlignedCalls=alignedCalls[jdx], Table=t2, ProbCalls=pcalls)
load('/home/data/colnames_1to20_ebpp.rda') rownames(reportedScores) <- str_replace_all(reportedScores$AliquotBarcode, pattern = '\.', replacement = '-')
manuscriptCalls <- reportedScores[cx, 'ClusterModel1']
newCalls <- finalResults$AlignedCalls
save(finalResults,file="tmp.Robj")
Hi there, Thank you for letting me know. You ask "How can it be?" Oh, anything is possible. Is this your own data? Is it something I can download and try? During the classification, the ensemble of classifiers makes a subtype call for each sample, then there's a step where all the calls must be merged. I think there's some randomness there, if a sample is on the border between clusters, it can bounce back and forth. I used the clue R package for that part. Or it's a bug. I'll check it out. -dave … On Tue, Apr 16, 2019 at 7:43 PM Tim0thy1 @.**> wrote: Hi, I detected the immune subtypes using iAtlas Tools and immune subtypes notebook. But the finalResults are different . How can it be? #####result of notebook###### 1 2 3 4 5 6 Call XY1 0.0000000 0.0000000 0.0312500 0.7265625 0.1562500 0.0859375 4 XY2 0.0000000 0.0000000 0.9375000 0.0000000 0.0234375 0.0390625 3 XY3 0.0078125 0.0000000 0.5781250 0.2578125 0.0312500 0.1250000 3* XY4 0.0078125 0.0156250 0.0000000 0.7500000 0.0390625 0.1875000 4 XY5 0.0000000 0.0078125 0.0625000 0.7109375 0.0937500 0.1250000 4 XY6 0.0078125 0.0234375 0.0000000 0.7500000 0.0000000 0.2187500 4 XY7 0.3046875 0.0625000 0.0000000 0.2109375 0.0000000 0.4218750 6 XY8 0.0234375 0.0000000 0.7968750 0.0156250 0.0390625 0.1250000 3 XY9 0.0156250 0.0234375 0.0000000 0.7812500 0.0234375 0.1562500 4 XY10 0.0156250 0.8984375 0.0000000 0.0156250 0.0000000 0.0703125 2 XY11 0.0078125 0.0156250 0.0000000 0.7500000 0.0078125 0.2187500 4 XY12 0.0000000 0.0000000 0.5000000 0.3828125 0.0781250 0.0390625 3 XY13 0.0000000 0.0000000 0.0000000 0.9531250 0.0078125 0.0390625 4 XY14 0.0078125 0.0156250 0.0000000 0.7656250 0.0156250 0.1953125 4 XY15 0.0000000 0.0000000 0.0000000 0.9453125 0.0156250 0.0390625 4 XY16 0.0078125 0.0000000 0.9453125 0.0000000 0.0078125 0.0390625 3 XY17 0.0390625 0.1875000 0.0000000 0.4687500 0.0000000 0.3046875 4 XY18 0.0156250 0.1171875 0.6250000 0.0546875 0.0000000 0.1875000 3 XY19 0.0078125 0.0156250 0.0000000 0.7812500 0.0390625 0.1562500 4 XY20 0.0468750 0.0234375 0.0000000 0.7031250 0.0000000 0.2265625 4 — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#3>, or mute the thread https://github.com/notifications/unsubscribe-auth/AAiSyQOyKUNRDmtEiTLY4ic5DRYHXdcUks5vhopLgaJpZM4c0E_M .
Sorry, my English is not very good. I must have mistaken 'How can it be' meaning.
Your english is good. I read your question as “how is that possible?” ... which is fine. They should give mostly the same answer.
But this project has been very difficult for me.
:-)
On Wed, Apr 17, 2019 at 6:33 PM Tim0thy1 notifications@github.com wrote:
Hi there, Thank you for letting me know. You ask "How can it be?" Oh, anything is possible. Is this your own data? Is it something I can download and try? During the classification, the ensemble of classifiers makes a subtype call for each sample, then there's a step where all the calls must be merged. I think there's some randomness there, if a sample is on the border between clusters, it can bounce back and forth. I used the clue R package for that part. Or it's a bug. I'll check it out. -dave … <#m-4400710527188761948> On Tue, Apr 16, 2019 at 7:43 PM Tim0thy1 @.**> wrote: Hi, I detected the immune subtypes using iAtlas Tools and immune subtypes notebook. But the finalResults are different . How can it be? #####result of notebook###### 1 2 3 4 5 6 Call XY1 0.0000000 0.0000000 0.0312500 0.7265625 0.1562500 0.0859375 4 XY2 0.0000000 0.0000000 0.9375000 0.0000000 0.0234375 0.0390625 3 XY3 0.0078125 0.0000000 0.5781250 0.2578125 0.0312500 0.1250000 3* XY4 0.0078125 0.0156250 0.0000000 0.7500000 0.0390625 0.1875000 4 XY5 0.0000000 0.0078125 0.0625000 0.7109375 0.0937500 0.1250000 4 XY6 0.0078125 0.0234375 0.0000000 0.7500000 0.0000000 0.2187500 4 XY7 0.3046875 0.0625000 0.0000000 0.2109375 0.0000000 0.4218750 6 XY8 0.0234375 0.0000000 0.7968750 0.0156250 0.0390625 0.1250000 3 XY9 0.0156250 0.0234375 0.0000000 0.7812500 0.0234375 0.1562500 4 XY10 0.0156250 0.8984375 0.0000000 0.0156250 0.0000000 0.0703125 2 XY11 0.0078125 0.0156250 0.0000000 0.7500000 0.0078125 0.2187500 4 XY12 0.0000000 0.0000000 0.5000000 0.3828125 0.0781250 0.0390625 3 XY13 0.0000000 0.0000000 0.0000000 0.9531250 0.0078125 0.0390625 4 XY14 0.0078125 0.0156250 0.0000000 0.7656250 0.0156250 0.1953125 4 XY15 0.0000000 0.0000000 0.0000000 0.9453125 0.0156250 0.0390625 4 XY16 0.0078125 0.0000000 0.9453125 0.0000000 0.0078125 0.0390625 3 XY17 0.0390625 0.1875000 0.0000000 0.4687500 0.0000000 0.3046875 4 XY18 0.0156250 0.1171875 0.6250000 0.0546875 0.0000000 0.1875000 3 XY19 0.0078125 0.0156250 0.0000000 0.7812500 0.0390625 0.1562500 4 XY20 0.0468750 0.0234375 0.0000000 0.7031250 0.0000000 0.2265625 4 — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#3 https://github.com/Gibbsdavidl/Immune-Subtype-Clustering/issues/3>, or mute the thread https://github.com/notifications/unsubscribe-auth/AAiSyQOyKUNRDmtEiTLY4ic5DRYHXdcUks5vhopLgaJpZM4c0E_M .
Sorry, my English is not very good. I must have mistaken 'How can it be' meaning.
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/Gibbsdavidl/Immune-Subtype-Clustering/issues/3#issuecomment-484321767, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEJFSOFE6TZ7XHQRJYCSCTPQ7FWHANCNFSM4HGQJ7GA .
@Gibbsdavidl Thanks for your replay. The website tools and code on R notebook.
Which one do you suggest and why?
I just tried it and got only one difference in cluster label between the notebook and webapp (XY20).
One place that can cause differences is in the selection of 'ensemble size'. The clustering is done using a 'model based method'. And in this case, it was an ensemble of models. That can change the results.
They use the same code... shouldn't be many differences except those due to the random nature of combining across the ensemble.
If you want to run it many times, running it locally would probably be faster.
Thanks!
@Gibbsdavidl Can you send your code to me. I want to campare your codes to my. By the way, I can't connect the webapp server all day...
@Gibbsdavidl Can you get my message?
Hi there, Sorry about the delay. I just used the notebook in the repo with no changes... "Call_immune_subtypes_on_new_data.ipynb https://github.com/Gibbsdavidl/Immune-Subtype-Clustering/blob/master/Notebooks/Call_immune_subtypes_on_new_data.ipynb "
-dave
On Sun, Apr 28, 2019 at 5:59 PM Tim0thy1 notifications@github.com wrote:
@Gibbsdavidl https://github.com/Gibbsdavidl Can you get my message?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Gibbsdavidl/Immune-Subtype-Clustering/issues/3#issuecomment-487430435, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEJFSK3YEX6TAOHCEIGBA3PSZB53ANCNFSM4HGQJ7GA .
Webapp seems to be working OK today. Thanks for letting us know. -dave
On Tue, Apr 23, 2019 at 12:18 AM Tim0thy1 notifications@github.com wrote:
@Gibbsdavidl https://github.com/Gibbsdavidl Can you send your code to me. I want to campare your codes to my. By the way, I can't connect the webapp server all day...
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Gibbsdavidl/Immune-Subtype-Clustering/issues/3#issuecomment-485674280, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEJFSP5SZ6IKWCFUXJVPIDPR2Z3BANCNFSM4HGQJ7GA .
That is very strange. I will try the code on notebook again.
We can communicate with E-mail next time.
Hi,
I detected the immune subtypes using iAtlas Tools and immune subtypes notebook for same data(ebpp_test1_1to20.tsv). But the finalResults are different . How can it be?
result of notebook
XY1 0.0000000 0.0000000 0.0312500 0.7265625 0.1562500 0.0859375 4 XY2 0.0000000 0.0000000 0.9375000 0.0000000 0.0234375 0.0390625 3 XY3 0.0078125 0.0000000 0.5781250 0.2578125 0.0312500 0.1250000 3 XY4 0.0078125 0.0156250 0.0000000 0.7500000 0.0390625 0.1875000 4 XY5 0.0000000 0.0078125 0.0625000 0.7109375 0.0937500 0.1250000 4 XY6 0.0078125 0.0234375 0.0000000 0.7500000 0.0000000 0.2187500 4 XY7 0.3046875 0.0625000 0.0000000 0.2109375 0.0000000 0.4218750 6 XY8 0.0234375 0.0000000 0.7968750 0.0156250 0.0390625 0.1250000 3 XY9 0.0156250 0.0234375 0.0000000 0.7812500 0.0234375 0.1562500 4 XY10 0.0156250 0.8984375 0.0000000 0.0156250 0.0000000 0.0703125 2 XY11 0.0078125 0.0156250 0.0000000 0.7500000 0.0078125 0.2187500 4 XY12 0.0000000 0.0000000 0.5000000 0.3828125 0.0781250 0.0390625 3 XY13 0.0000000 0.0000000 0.0000000 0.9531250 0.0078125 0.0390625 4 XY14 0.0078125 0.0156250 0.0000000 0.7656250 0.0156250 0.1953125 4 XY15 0.0000000 0.0000000 0.0000000 0.9453125 0.0156250 0.0390625 4 XY16 0.0078125 0.0000000 0.9453125 0.0000000 0.0078125 0.0390625 3 XY17 0.0390625 0.1875000 0.0000000 0.4687500 0.0000000 0.3046875 4 XY18 0.0156250 0.1171875 0.6250000 0.0546875 0.0000000 0.1875000 3 XY19 0.0078125 0.0156250 0.0000000 0.7812500 0.0390625 0.1562500 4 XY20 0.0468750 0.0234375 0.0000000 0.7031250 0.0000000 0.2265625 4
result of tools