HenrikBengtsson / R.matlab

R package: R.matlab
https://cran.r-project.org/package=R.matlab
86 stars 25 forks source link

readMat cannot read sparse matrices with 0 rows (and 1 column) #14

Closed jefferis closed 9 years ago

jefferis commented 9 years ago

I am analysing some matlab data provided with a scientific publication (in neuroscience). The supplied mat file includes a struct with a field cyclicalEdges which is a sparse matrix. In many cases this field is empty, at which point the size according to matlab is 0 1 i.e. 0 rows and 1 column. These dimensions result in an error in readMat

https://github.com/HenrikBengtsson/R.matlab/blob/20c153fcb53f5d87ad778fdb4bcb07c04a9c46b7/R/readMat.R#L2211-L2213

The following toy example writes a matlab (v5) file containing such a sparse matrix and then tries to read it.


tf<-tempfile()
writeBin(as.raw(c(77, 65, 84, 76, 65, 66, 32, 53, 46, 48, 32, 77, 
65, 84, 45, 102, 105, 108, 101, 44, 32, 80, 108, 97, 
116, 102, 111, 114, 109, 58, 32, 77, 65, 67, 73, 54, 
52, 44, 32, 67, 114, 101, 97, 116, 101, 100, 32, 111, 
110, 58, 32, 84, 104, 117, 32, 74, 117, 108, 32, 50, 
51, 32, 49, 52, 58, 52, 55, 58, 51, 53, 32, 50, 48, 
49, 53, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 
32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 0, 
1, 73, 77, 15, 0, 0, 0, 58, 0, 0, 0, 120, 156, 227, 
99, 96, 96, 136, 0, 98, 54, 32, 230, 0, 98, 86, 1, 
6, 6, 70, 16, 13, 229, 131, 0, 35, 20, 243, 2, 113, 
114, 101, 114, 78, 102, 114, 98, 142, 107, 74, 122, 
106, 49, 88, 29, 11, 88, 13, 178, 122, 16, 224, 132, 
210, 0, 19, 102, 5, 242)), tf)

readMat(tf)

gives the error:

Error in mat5ReadMiMATRIX(this, tag) : 
  MAT v5 file format error: Some elements in row vector 'ir' (sparse arrays) are out of range [0,-1]. 
HenrikBengtsson commented 9 years ago

Thank you for the report. I can reproduce this:

> readMat(tf, verbose=-100)
Opens binary file: C:\Users\hb\AppData\Local\Temp\Rtmpykdk7X\file14d03f566995
Trying to read MAT v5 file stream.
Reading data element...
 Reading Tag...
  Reading Tag...
  Reading Tag...done
 Reading Tag...done
 Reading (outer) miMATRIX...
  Reading miMATRIX...
   Argument 'tag':
   List of 7
    $ type      : chr "miMATRIX"
    $ signed    : Named logi NA
     ..- attr(*, "names")= chr "miMATRIX"
    $ sizeOf    : Named int NA
     ..- attr(*, "names")= chr "miMATRIX"
    $ what      : logi NA
    $ nbrOfBytes: int 88
    $ padding   : int 0
    $ compressed: logi FALSE
   Reading Tag...
   Reading Tag...done
   Reading Array Flags...
   Reading Array Flags...done
   List of 6
    $ logical  : logi FALSE
    $ global   : logi FALSE
    $ complex  : logi FALSE
    $ class    : chr "mxSPARSE_CLASS"
    $ classSize: Named num NA
     ..- attr(*, "names")= chr "mxSPARSE_CLASS"
    $ nzmax    : int 1
   Reading Dimensions Array...
    Reading Tag...
    Reading Tag...done
   Reading Dimensions Array...done
   Reading Array Name...
    Reading Tag...
    Reading Tag...done
    Name: 'cyclicalEdges'
   Reading Array Name...done
   Array name: 'cyclicalEdges'
   Reading mxSPARSE_CLASS 0x1 matrix....
    Reading Values...
     Reading Tag...
     Reading Tag...done
    Reading Values...done
Error in mat5ReadMiMATRIX(this, tag) :
  MAT v5 file format error: Some elements in row vector 'ir' (sparse arrays) are out of range [0,-1].
   Reading mxSPARSE_CLASS 0x1 matrix....done
   Binary file closed.
>

This looks like a bug to me. Unfortunately, I won't have time to get to this until end-of-August. I welcome patches (preferable as pull request) as an alternative. There's no need to rebuild package while troubleshooting; library(R.methodsS3); source("R/readMat.R") should be enough.

> sessionInfo()
R version 3.2.1 Patched (2015-07-11 r68646)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] R.matlab_3.2.0-9000

loaded via a namespace (and not attached):
[1] tools_3.2.1            R.methodsS3_1.7.0-9000 R.utils_2.1.0
[4] R.oo_1.19.0
>
HenrikBengtsson commented 9 years ago

Thanks for the PR #15. I've added package tests and more. The easiest way to install this 'develop' version is to do:

source("http://callr.org/install#HenrikBengtsson/R.matlab@develop")