HenrikBengtsson / R.matlab

R package: R.matlab
https://cran.r-project.org/package=R.matlab
86 stars 25 forks source link

"Unknown array type" for .mat containing functions #28

Closed anilatx closed 8 years ago

anilatx commented 9 years ago

Error in mat5ReadArrayFlags(this) : Unknown array type (class). Not in [1,15]: 16

Nobody expects readMat to load functions correctly, but I suggest that this would be just warning(with the names listed), not error - at least for functions.

Minimal example: $ matlab -r "f=@(x) x; y=2; save('f.mat','f','y'); quit" $ Rscript -e 'library(R.matlab); readMat("f.mat")' It would be nice if readMat would recover contant of known types

HenrikBengtsson commented 9 years ago

Agree - however, it needs to be able to parse the block in order to skip it, or at least parse it far enough in order to know how many bytes to skip. I'll add it to the wish list.

HenrikBengtsson commented 8 years ago

I've updated (develop branch; commit b751c67) such that we now get:

> R.matlab::readMat("f.mat")
Error in mat5ReadArrayFlags(this) :
  Unsupported array type (class): 16 ('mxFUN_CLASS')

It's a start.

HenrikBengtsson commented 8 years ago

In order to parse and skip such elements (with a warning), I need to know the format. Interestingly, the "array type (class)" with value 16 is not officially documented. In Table 1-3. 'MATLAB Array Types (Classes)' of 'MAT-File Format - R2015b' (MathWorks, Sept 2015) only types 1-15 are documented.

HenrikBengtsson commented 8 years ago

I think I figured out how to parse the function object stored in these MAT files, e.g.

> data <- R.matlab::readMat("f.mat")
> str(data)
List of 3
 $ f: raw [1:776] 06 00 00 00 ...
 $ y: num [1, 1] 2
 $  : int [1, 1:968] 0 1 73 77 0 0 0 0 14 0 ...
 - attr(*, "header")=List of 3
  ..$ description: chr "MATLAB 5.0 MAT-file, Platform: GLNXA64, Created on: Tue Dec 29 16:24:41 2015                                        Í\001"
  ..$ version    : chr "5"
  ..$ endian     : chr "little"
>

I chose to return the function object as raw bytes (this is undocumented so I picked raw);

> rawToChar(data$f)
Error in rawToChar(data$f) :
  embedded nul in string: '\006\0\0\0\b\0\0\0\002\0\0\0\0\0\0\0\005\0\0\0\b\0\0\
0\001\0\0\0\001\0\0\0\001\0\0\0\0\0\0\0\005\0\004\0\020\0\0\0\001\0\0\0@\0\0\0ma
tlabroot\0\0\0\0\0\0separator\0\0\0\0\0\0\0sentinel\0\0\0\0\0\0\0\0function_hand
le\0\016\0\0\0H\0\0\0\006\0\0\0\b\0\0\0\004\0\0\0\0\0\0\0\005\0\0\0\b\0\0\0\001\
0\0\0\030\0\0\0\001\0\0\0\0\0\0\0\020\0\0\0\030\0\0\0/opt/local/MATLAB/R2012a\01
6\0\0\00\0\0\0\006\0\0\0\b\0\0\0\004\0\0\0\0\0\0\0\005\0\0\0\b\0\0\0\001\0\0\0\0
01\0\0\0\001\0\0\0\0\0\0\0\020\0\001\0/\0\0\0\016\0\0\00\0\0\0\006\0\0\0\b\0\0\0
\004\0\0\0\0\0\0\0\005\0\0\0\b\0\0\0\001\0\0\0\001\0\0\0\001\0\0\0\0\0\0\0\020\0
\001\0@\0\0\0\016\0\0\0È\001\0\0\006\0\0\0\b\0\0\0\002\0\0\0\0\0\0\0\005\0\0\0\b
\0\0\0\001\0\0\0\001\0\0\0\001\0\0\0\0\0\0\0\005\0\004\0\n\0\0\0\001\0\0\0(\0\0\
0function\0\0type\0\0\0\0\0\0file\0\0\0\0\0\0workspace\0\016\0\0\0@\0\0\0\006\0\
0\0\b\0\0\0\004\0\0\0\0\0\0\0\005\0\0\0\b\0\0\0\001\0\0\0\t\0\0\0\001\0\0\0\0\0\
0\0\020\0\0\0\t\0\0\0sf%0@(x)x\0

It's interesting to see that the MATLAB expression in plain text seems to be part of this object (at the end of the byte stream).

I don't understand what that unnamed 3rd element data[[3]] contains, but it looks like it is some binary information (stored as integers for unknown reasons). It could be some session-specific information, e.g. promises etc.

> rawToChar(as.raw(data[[3]]))
Error in rawToChar(as.raw(data[[3]])) :
  embedded nul in string: '\0\001IM\0\0\0\0\016\0\0\0(\003\0\0\006\0\0\0\b\0\0\0
\002\0\0\0\0\0\0\0\005\0\0\0\b\0\0\0\001\0\0\0\001\0\0\0\001\0\0\0\0\0\0\0\005\0
\004\0\005\0\0\0\001\0\0\0\005\0\0\0MCOS\0\0\0\0\016\0\0\0à\002\0\0\006\0\0\0\b\
0\0\0\021\0\0\0\0\0\0\0\001\0\0\0\0\0\0\0\001\0\004\0MCOS\001\0\0\0\r\0\0\0FileW
rapper__\0\0\0\016\0\0\0 \002\0\0\006\0\0\0\b\0\0\0\001\0\0\0\0\0\0\0\005\0\0\0\
b\0\0\0\004\0\0\0\001\0\0\0\001\0\0\0\0\0\0\0\016\0\0\0ø\0\0\0\006\0\0\0\b\0\0\0
\t\0\0\0\0\0\0\0\005\0\0\0\b\0\0\0È\0\0\0\001\0\0\0\001\0\0\0\0\0\0\0\002\0\0\0È
\0\0\0\002\0\0\0\002\0\0\0H\0\0\0h\0\0\0€\0\0\0°\0\0\0¸\0\0\0È\0\0\0\0\0\0\0\0\0
\0\0any\0function_handle_workspace\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0
\002\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\001\0\0\0\001\0\0\0\001\0\0\0\0\0\0\0
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\001\0\0\0\0\0\0\0\0\0\0\0\001\0
\0\0\0\0\0\0\001\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\016\0\0\0
\0\0\0\0\016\0\0\0¸\0\0\0\006\0\

Either way, unless there is some information of value in there that can be parsed further and we can find documentation of how to parse it, I think this is good enough.

@anilatx, please test the develop version by:

source("http://callr.org/install#HenrikBengtsson/R.matlab@develop")
HenrikBengtsson commented 8 years ago

I'll consider this one resolved. If any issues, please reopen.