federicogiorgi / corto

corto (Correlation Tool): an R package to generate correlation-based DPI networks
20 stars 7 forks source link

row.names error in corto function #6

Open tboen1 opened 2 years ago

tboen1 commented 2 years ago

I'm trying to use corto to run the ARACNE algorithm on my own dataset. I'm only interested in one transcription factor ("ENSMUSG00000035799"), and my variance stabilized data is in the matrix called 'vst_control'. I run the code as follows:

regulon<-corto(vst_control, centroids=c("ENSMUSG00000035799"), nbootstraps=10,p=1e-30,nthreads=2, verbose = TRUE)

and I keep running into this error:

Error in .rowNamesDF<-(x, value = value) : invalid 'row.names' length

I performed a stack traceback and it appears that the error occurs during the centroid/triplet processing stage,

image

Has anyone seen or resolved an error like this before? I compared my own data matrix and centroid list against the examples provided, and they seem to be the same data type and format.

federicogiorgi commented 2 years ago

Hi sorry for the inexcusably late reply!

I have recreated your error. Two issues:

See for example, the error of failing row assignment can be removed by providing more centroids and providing a bigger input matrix (here I generated one by cbinding your original input matrix 4 times to a total of 12 samples... Of course you should never repeat samples in a real scenario, this is just to show you a working example)

input<-cbind(vst_control,vst_control,vst_control,vst_control)
regulon<-corto(input,centroids=rownames(vst_control)[1:5],
               nbootstraps=10,p=1e-30,nthreads=2, verbose = TRUE)

In your case, the best thing you can do is calculating the correlation between your only centroid (Twist1, or ENSMUSG00000035799) and the rest of the dataset. I see that in your original data you have 6 samples. Still not enough for a proper correlation analysis, but if you manage to find a bigger dataset, something like this:

# Simple corto
vst<-varianceStabilizingTransformation(as.matrix(matrix))
centroid<-"ENSMUSG00000035799"
cors<-cor(vst[centroid,],t(vst))
cors<-setNames(as.vector(cors),rownames(vst))
cors<-sort(cors,decreasing=TRUE)
write.table(cors[1:20],col=FALSE,quote=FALSE)

Which will give you this result (again, 6 samples is not enough for a significant inference, and the first correlator is in fact Twist1 itself):

ENSMUSG00000035799  1
ENSMUSG00000022817  0.999356333801304
ENSMUSG00000009216  0.998757794167832
ENSMUSG00000020099  0.99864702684399
ENSMUSG00000105096  0.998507556485142
ENSMUSG00000018001  0.997602146984105
ENSMUSG00000027848  0.997449013931447
ENSMUSG00000022799  0.997322956721468
ENSMUSG00000025135  0.996866572771553
ENSMUSG00000039542  0.995138703018899
ENSMUSG00000056174  0.995096963476483
ENSMUSG00000074736  0.994770949809045
ENSMUSG00000020674  0.994356562847849
ENSMUSG00000029718  0.993000542465269
ENSMUSG00000021614  0.992998710010364
ENSMUSG00000061436  0.992956512792065
ENSMUSG00000003154  0.992783016934144
ENSMUSG00000029086  0.992662493474623
ENSMUSG00000085939  0.992651321385568
ENSMUSG00000061878  0.992589774115791
>