Open renanmcosta opened 1 year ago
For now I've managed to fetch with the temporary fix below. I don't think it's very robust, but I'm copying it here in case it's informative.
def read_cell_array(self):
"""deserialize MATLAB cell array"""
n_dims = self.read_value()
shape = self.read_value(count=n_dims)
n_elem = int(np.prod(shape))
result = [self.read_blob(n_bytes=self.read_value()) for _ in range(n_elem)]
if n_elem != len(np.ravel(result, order="F")): # if not all elements are scalars. shouldn't work for ragged arrays
shape = (-1,) + tuple(shape[1:n_dims])
return (
self.squeeze(
np.array(result).reshape(shape, order="F"), convert_to_scalar=False
)
).view(MatCell)
Greetings,
I have just encountered the same problem, and temp fix seems to work (Thanks a lot @renanmcosta)
Temporary fix returns an array but with shape = (537000, 2).
In matlab its an 1×2 cell array {10×5370×10 single} {10×5370×10 single}.
type(temp_fixed) --> datajoint.blob.MatCell
Am I able to retrieve the original dimensions or this is a robustness problem of the temporary fix?
Thanks in advance
Hi @Paschas, could you update us on this? We are looking to resolve this.
Greetings,
I have just encountered the same problem, and temp fix seems to work (Thanks a lot @renanmcosta)
Temporary fix returns an array but with shape = (537000, 2). In matlab its an 1×2 cell array {10×5370×10 single} {10×5370×10 single}.
type(temp_fixed) --> datajoint.blob.MatCell
Am I able to retrieve the original dimensions or this is a robustness problem of the temporary fix?
Thanks in advance
The temp fix is responsible for the shape differences there. Lately, I have been using a simpler fix, which shouldn't collapse any dimensions. This is one should always work, though it's possible that it can lead to awkward array nesting at times.
def fix_cell_array_fetch():
"""Fixes bug that prevents cell arrays from being fetched in python in certain
cases. Replaces cell array unpacking method in the datajoint module with working
version.
"""
class Blob(dj.blob.Blob):
def read_cell_array(self):
"""deserialize MATLAB cell array"""
n_dims = self.read_value()
shape = self.read_value(count=n_dims)
n_elem = int(np.prod(shape))
result = [self.read_blob(n_bytes=self.read_value()) for _ in range(n_elem)]
return (
self.squeeze(np.array(result, dtype="object"), convert_to_scalar=False)
).view(dj.blob.MatCell)
dj.blob.Blob = Blob
Let's see if we can incorporate this in this coming release.
Greetings @dimitri-yatsenko & @renanmcosta
Without @renanmcosta's fixes I used to get 2 types of error:
in Blob.read_cell_array(self)
[493] n_elem = int(np.prod(shape))
[494] result = [self.read_blob(n_bytes=self.read_value()) for _ in range(n_elem)]
[495] return (
[496] self.squeeze(
[497] #np.array(result).reshape(shape, order="F"), convert_to_scalar=False
[498] #np.array(result).reshape(shape, order="C"), convert_to_scalar=False
--> [499] np.array(result).reshape(shape, order="A"), convert_to_scalar=False
[500]
[501])
[502].view(MatCell)
ValueError: cannot reshape array of size 2560 into shape (1,10)
or
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (4,) + inhomogeneous part.
The fix_cell_array_fetch() is working but I would be cautious (Thanks again @renanmcosta )
In different but similar occasion arrays had the correct shape but data were shuffled, eventually a changed the following:
# line 243 of blob.py
def read_array(self):
.....
# Changed Nothing
.....
return self.squeeze(data.reshape(shape, order="C")) # It was F
We just found a new case where the latest approach I posted above still raises a ValueError, e.g.:
ValueError: could not broadcast input array from shape (3,) into shape (1,)
It happens when the first dimension of each entry is the same, and appears to be a limitation of numpy (discussion).
Ultimately the problem is that MATLAB cell arrays and numpy arrays are intended as different types of objects, and as a result MATLAB cell arrays can be ragged in ways that numpy is unwilling to support.
Here's my current solution, which should hopefully retain the structure of each entry:
class fixed_Blob(dj.blob.Blob):
def read_cell_array(self):
"""deserialize MATLAB cell array"""
n_dims = self.read_value()
shape = self.read_value(count=n_dims)
n_elem = int(np.prod(shape))
result = [self.read_blob(n_bytes=self.read_value()) for _ in range(n_elem)]
arr = np.empty(n_elem, dtype="object")
arr[:] = result
return (self.squeeze(arr, convert_to_scalar=False)).view(dj.blob.MatCell)
Bug Report
Description
Fetching fails in python when each entry for a given attribute (defined in matlab) is a cell array, and each element of the cell array is an array of doubles. Fetching in matlab works as expected.
Reproducibility
Windows, Python 3.9.13, DataJoint 0.13.8
Steps:
epoch_pos_range=null : blob # list of y position ranges corresponding to n epochs in epoch_list, (e.g., {[y_on y_off],[y_on y_off]} for epoch_list {'epoch1','epoch2'})
Error stack: