datajoint / datajoint-matlab

Relational data pipelines for the science lab
MIT License
42 stars 37 forks source link

Empty matrices are converted to NULL #414

Open dglee3 opened 2 years ago

dglee3 commented 2 years ago

Bug Report

Description

In a newer version of DataJoint, empty matrices ([]) are inserted into a table as NULL rather than a longblob representing the empty set. This results in discrepancies when fetching data uploaded by different versions.

Reproducibility

  1. Create a table with a longblob attribute that is non-nullable. In this example the attribute is defined as puff: longblob
  2. Insert [] into this attribute

Error in dj.Relvar/insert (line 353) [v, placeholder] = makePlaceholder(i, tuple.(header.attributes(i).name));

Error in experiment.TrialSummary/makeTuples (line 144) insert(self, key)

Error in dj.internal.AutoPopulate/taskCore (line 215) self.makeTuples(key)

Error in dj.internal.AutoPopulate>@(key,fun,args)fun(args{:}) (line 129) rels{i}.executionEngine = @(key, fun, args) fun(args{:});

Error in dj.internal.AutoPopulate/populate_ (line 327) self.executionEngine(key, @taskCore, {self, key})

Error in dj.internal.AutoPopulate/populate (line 130) [varargout{1:nargout}] = rels{i}.populate_(varargin{:});



### Expected Behavior
When a [] is inserted, we expect it to be stored as a longblob which reads as [] when the data is fetched. This is the behavior of another machine with the following specifications:

- OS: Windows 10 Pro
- MATLAB Version: R2019a
- MySQL Version: 8.0.29
- MySQL Deployment Strategy: Uncertain
- DataJoint Version: 3.3.1
jverswijver commented 2 years ago

Adding some documentation on reproducing this issue:

getSchema.m:

function obj = getSchema
persistent schemaObject
if isempty(schemaObject)
    schemaObject = dj.Schema(dj.conn, 'test', 'test');
end
obj = schemaObject;
end

Person.m:

%{
person: varchar(30)
-----
myblob: longblob
%}

classdef Person < dj.Manual
end

test_insert_empty_array.m:

clc
clear functions
dj.version
dj.conn
dj.createSchema('test', '/src/test', 'test')
test.Person
insert(test.Person, {'jeff', [1,2,3,4]})
insert(test.Person, {'jeff2', []})
schemaObject = dj.Schema(dj.conn, 'test', 'test');
fetch1(test.Person & 'person="jeff"', 'myblob')
fetch1(test.Person & 'person="jeff2"', 'myblob')
dropQuick(schemaObject)
When you run test_insert_empty_array.m on DJ Matlab version 3.3.1 and mym version 2.7.0 you get this output: ```matlab DataJoint version 3.3.1 Setting up mym... No connections open mym is now ready for use. 0: fakeservices.datajoint.io via TCP/IP Server version 5.7.33 (encrypted) connection_id() +---------------+ 84 ans = Connection with properties: host: 'fakeservices.datajoint.io' user: 'root' initQuery: '' use_tls: 'none' inTransaction: 0 connId: 0 packages: [0×1 containers.Map] foreignKeys: [0×0 struct] isConnected: 1 database created Warning: Directory already exists. > In dj.createSchema (line 52) In test_insert_empty_array (line 5) test.getSchema.m already exists ans = Object test.Person CREATE TABLE `test`.`person` ( `person` varchar(30) NOT NULL COMMENT "", `myblob` longblob NOT NULL COMMENT "", PRIMARY KEY (`person`) ) ENGINE = InnoDB, COMMENT "" :: :: 0 tuples (0.162 s) closing DataJoint connection #0 ans = 1 2 3 4 ans = [] done ```
When you run test_insert_empty_array.m on DJ Matlab version 3.5.0 and latest mym version you get this output: ```matlab DataJoint version 3.5.0 0: fakeservices.datajoint.io via TCP/IP Server version 5.7.33 (encrypted) database connection id: 19 ans = Connection with properties: host: 'fakeservices.datajoint.io:3306' user: 'root' initQuery: [] use_tls: 'none' inTransaction: 0 connId: 0 serverId: 19 packages: [0×1 containers.Map] schemas: [1×1 struct] foreignKeys: [0×0 struct] isConnected: 1 database created Warning: Directory already exists. > In dj.createSchema (line 56) In test_insert_empty_array (line 5) test.getSchema.m already exists ans = Object test.Person :: :: 0 tuples (0.193 s) Error using dj.Relvar/insert/makePlaceholder (line 240) attribute `myblob` is not nullable. Error in dj.Relvar/insert (line 353) [v, placeholder] = makePlaceholder(i, tuple.(header.attributes(i).name)); Error in test_insert_empty_array (line 8) insert(test.Person, {'jeff2', []}) ```

So that would confirm that there is a regression issue between the two version as you should be able to store and retrieve []

guzman-raphael commented 2 years ago

Thanks @jverswijver for testing and collecting these details! This does confirm a regression issue which elevates the priority.

@dglee3 We are wrapping up a few features on datajoint/datajoint-python so we'll be able to address this soon thereafter. We'll have more updates on planning this activity next week.