SofieVG / FlowSOM

Using self-organizing maps for visualization and interpretation of cytometry data
61 stars 26 forks source link

Calling MapDataDataToCodes produces long vectors error #54

Open jonhsussman opened 1 year ago

jonhsussman commented 1 year ago

Hello,

I am running FlowSOM:::MapDataToCodes(someWeights, as.matrix(fovPixelData)) which produces the following error:

Error in FlowSOM:::MapDataToCodes(someWeights, as.matrix(fovPixelData)) : 
  long vectors (argument 1) are not supported in .C

someWeights is a 100 x 10 numerical matrix, and fovPixel data is a large file (297923080 obs. of 10 variables). fovPixel is produced by reading a .feather file as data.table(arrow::read_feather(file))

Do you have any thoughts on what is causing this error?

Thanks!

SamGG commented 1 year ago

Did you try as.matrix(someWeights)? Did you check that both someWeights and fovPixelData have colnames() set to some values? As a reminder, the call to C function in charge of mapping is using the colnames of the codes to select the columns of the newdata matrix. @SofieVG I think we already discussed this at some point, but no colnames check is currently implemented. https://github.com/SofieVG/FlowSOM/blob/31bf74c131e0c1bee2584a060e99ceff8b74b5af/R/2_buildSOM.R#L246-L259

jonhsussman commented 1 year ago

Thanks for your reply.

I just tried as.matrix(someWeights)and also checked the colnames() of someWeights and fovPixelData and confirmed that they are both set to text values and are equivalent to each other. But unfortunately I still am encountering the same error.

SamGG commented 1 year ago

typeof(someWeights) and class(someWeights)?

jonhsussman commented 1 year ago

See below:

image

jonhsussman commented 1 year ago

Of note, when I reduce the fovPixelData to a much smaller amount just as trial: fovPixelData_less <- fovPixelData[1:1000000, ] then I no longer encounter this error. Additionally, this runs through with a an example data set of comparatively very small images. So I am worried that it is the case that it is simply larger sizes of files create an issue with the C code at this step. But I am not certain.

SamGG commented 1 year ago

The type sounds OK. In fact, I missed that the 1st argument of the C call is newdata, not codes. So as you clearly showed it, this is a size depending problem. I think that long vectors are used when the matrix is becoming too large to be indexed by a classical integer. The easiest workaround is to split the newdata and mapdatatocode by block. Here is a code I use typically.

# result from FlowSOM, 1 representative point per group
codes = as.matrix(iris[c(25,75,125),-5])
# data to map
nwdata = as.matrix(iris[,-5])

# map by block
block_size = 20  # to be defined, test 20 and 50
result = matrix(0.0, nrow(nwdata), 2)  # MapDataToCodes returns 2 columns
block_end = nrow(nwdata)
for (block_i in 0:((block_end-1) %/% block_size)) {
  i_start = 1 + block_i*block_size
  i_end = min((block_i+1)*block_size, block_end)
  cat(i_start, i_end, "\n")
  result[i_start:i_end,] = FlowSOM:::MapDataToCodes(codes, nwdata[i_start:i_end,])
}
#> 1 20 
#> 21 40 
#> 41 60 
#> 61 80 
#> 81 100 
#> 101 120 
#> 121 140 
#> 141 150

# for fun
table(result[,1], iris[,5])
#>    
#>     setosa versicolor virginica
#>   1     50          1         0
#>   2      0         48        13
#>   3      0          1        37

Created on 2022-09-21 by the reprex package (v2.0.1)

jonhsussman commented 1 year ago

Thanks for this very excellent solution! This works perfectly to solve the problem and runs very efficiently, and has enabled us to use the package.

jonhsussman commented 1 year ago

Thanks for your reply.

I just tried 'as.matrix(someWeights)' and also checked the colnames() of someWeights and fovPixelData and confirmed that they are both set to text values and are equivalent to each other. But unfortunately I still am encountering the same error.

On Wed, Sep 21, 2022 at 10:23 AM Samuel Granjeaud @.***> wrote:

Did you try as.matrix(someWeights)? Did you check that both someWeights and fovPixelData have colnames() set to some values? As a reminder, the call to C function in charge of mapping is using the colnames of the codes to select the columns of the newdata matrix. @SofieVG https://urldefense.com/v3/__https://github.com/SofieVG__;!!LIr3w8kk_Xxm!pRx95MDMDgB-Tup6n-yQa40ViWRjGDn017COquYqNh_IZjPQPfshbmK7pJAkokK67_uMrnREJdMu1jzKirzEPOKN$ I think we already discussed this at some point, but no colnames check is currently implemented.

https://github.com/SofieVG/FlowSOM/blob/31bf74c131e0c1bee2584a060e99ceff8b74b5af/R/2_buildSOM.R#L246-L259 https://urldefense.com/v3/__https://github.com/SofieVG/FlowSOM/blob/31bf74c131e0c1bee2584a060e99ceff8b74b5af/R/2_buildSOM.R*L246-L259__;Iw!!LIr3w8kk_Xxm!pRx95MDMDgB-Tup6n-yQa40ViWRjGDn017COquYqNh_IZjPQPfshbmK7pJAkokK67_uMrnREJdMu1jzKivaO2iI6$

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/SofieVG/FlowSOM/issues/54*issuecomment-1253786549__;Iw!!LIr3w8kk_Xxm!pRx95MDMDgB-Tup6n-yQa40ViWRjGDn017COquYqNh_IZjPQPfshbmK7pJAkokK67_uMrnREJdMu1jzKijnHuIPF$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKIGYMVMO42LL7M4W3DIDTLV7MK4HANCNFSM6AAAAAAQSC4QSA__;!!LIr3w8kk_Xxm!pRx95MDMDgB-Tup6n-yQa40ViWRjGDn017COquYqNh_IZjPQPfshbmK7pJAkokK67_uMrnREJdMu1jzKimJ6cHyt$ . You are receiving this because you authored the thread.Message ID: @.***>