LorenFrankLab / spyglass

Neuroscience data analysis framework for reproducible research built by Loren Frank Lab at UCSF
https://lorenfranklab.github.io/spyglass/
MIT License
94 stars 43 forks source link

Keys too large for MySQL defaults #630

Open CBroz1 opened 1 year ago

CBroz1 commented 1 year ago

common_backup has some unnecessarily long primary keys with varchar(500). @akgillespie8 found this to be an issue when declaring tables for her MySQL instance. We should either...

  1. Add instructions for how to circumvent this issue during database setup.
  2. Reduce these varchars to accommodate default settings, and adjust prod accordingly.

Given that the current max length of entries is 60, I think we can safely do the latter. I suggest varchar(128)

```python > from spyglass import common --------------------------------------------------------------------------- OperationalError Traceback (most recent call last) Cell In[13], line 1 ----> 1 from spyglass import common File ~/Documents/anna/spyglass/src/spyglass/common/_init_.py:14 3 import spyglass as sg 5 from ..utils.nwb_helper_fn import ( 6 close_nwb_files, 7 estimate_sampling_rate, (...) 12 get_valid_intervals, 13 ) ---> 14 from .common_backup import CuratedSpikeSortingBackUp, SpikeSortingBackUp 15 from .common_behav import ( 16 PositionSource, 17 RawPosition, (...) 21 convert_epoch_interval_name_to_position_interval_name, 22 ) 23 from .common_device import ( 24 CameraDevice, 25 DataAcquisitionDevice, (...) 29 ProbeType, 30 ) File ~/Documents/anna/spyglass/src/spyglass/common/common_backup.py:8 2 import numpy as np 4 schema = dj.schema(“common_backup”) 7 @schema ----> 8 class SpikeSortingBackUp(dj.Manual): 9 definition = “”" 10 nwb_file_name: varchar(500) 11 sort_group_id: int (...) 20 units_object_id: varchar(100) 21 “”" 23 def insert_from_backup(self, backup_file): File ~/mambaforge/envs/anna_spyglass_env2/lib/python3.9/site-packages/datajoint/schemas.py:177, in Schema._call_(self, cls, context) 173 raise DataJointError( 174 “The schema decorator should not be applied to Part tables.” 175 ) 176 if self.is_activated(): --> 177 self._decorate_master(cls, context) 178 else: 179 self.declare_list.append((cls, context)) File ~/mambaforge/envs/anna_spyglass_env2/lib/python3.9/site-packages/datajoint/schemas.py:188, in Schema._decorate_master(self, cls, context) 182 def _decorate_master(self, cls, context): 183 “”" 184 185 :param cls: the master class to process 186 :param context: the class’ declaration context 187 “”" --> 188 self._decorate_table( 189 cls, context=dict(context, self=cls, **{cls._name_: cls}) 190 ) 191 # Process part tables 192 for part in ordered_dir(cls): File ~/mambaforge/envs/anna_spyglass_env2/lib/python3.9/site-packages/datajoint/schemas.py:226, in Schema._decorate_table(self, table_class, context, assert_declared) 224 is_declared = instance.is_declared 225 if not is_declared and not assert_declared and self.create_tables: --> 226 instance.declare(context) 227 self.connection.dependencies.clear() 228 is_declared = is_declared or instance.is_declared File ~/mambaforge/envs/anna_spyglass_env2/lib/python3.9/site-packages/datajoint/table.py:102, in Table.declare(self, context) 100 for store in external_stores: 101 self.connection.schemas[self.database].external[store] --> 102 self.connection.query(sql) 103 except AccessError: 104 # skip if no create privilege 105 pass File ~/mambaforge/envs/anna_spyglass_env2/lib/python3.9/site-packages/datajoint/connection.py:340, in Connection.query(self, query, args, as_dict, suppress_warnings, reconnect) 338 cursor = self._conn.cursor(cursor=cursor_class) 339 try: --> 340 self._execute_query(cursor, query, args, suppress_warnings) 341 except errors.LostConnectionError: 342 if not reconnect: File ~/mambaforge/envs/anna_spyglass_env2/lib/python3.9/site-packages/datajoint/connection.py:296, in Connection._execute_query(cursor, query, args, suppress_warnings) 294 cursor.execute(query, args) 295 except client.err.Error as err: --> 296 raise translate_query_error(err, query) File ~/mambaforge/envs/anna_spyglass_env2/lib/python3.9/site-packages/datajoint/connection.py:294, in Connection._execute_query(cursor, query, args, suppress_warnings) 291 if suppress_warnings: 292 # suppress all warnings arising from underlying SQL library 293 warnings.simplefilter(“ignore”) --> 294 cursor.execute(query, args) 295 except client.err.Error as err: 296 raise translate_query_error(err, query) File ~/mambaforge/envs/anna_spyglass_env2/lib/python3.9/site-packages/pymysql/cursors.py:153, in Cursor.execute(self, query, args) 149 pass 151 query = self.mogrify(query, args) --> 153 result = self._query(query) 154 self._executed = query 155 return result File ~/mambaforge/envs/anna_spyglass_env2/lib/python3.9/site-packages/pymysql/cursors.py:322, in Cursor._query(self, q) 320 conn = self._get_db() 321 self._clear_result() --> 322 conn.query(q) 323 self._do_get_result() 324 return self.rowcount File ~/mambaforge/envs/anna_spyglass_env2/lib/python3.9/site-packages/pymysql/connections.py:558, in Connection.query(self, sql, unbuffered) 556 sql = sql.encode(self.encoding, “surrogateescape”) 557 self._execute_command([COMMAND.COM](http://command.com/)_QUERY, sql) --> 558 self._affected_rows = self._read_query_result(unbuffered=unbuffered) 559 return self._affected_rows File ~/mambaforge/envs/anna_spyglass_env2/lib/python3.9/site-packages/pymysql/connections.py:822, in Connection._read_query_result(self, unbuffered) 820 else: 821 result = MySQLResult(self) --> 822 result.read() 823 self._result = result 824 if result.server_status is not None: File ~/mambaforge/envs/anna_spyglass_env2/lib/python3.9/site-packages/pymysql/connections.py:1200, in MySQLResult.read(self) 1198 def read(self): 1199 try: -> 1200 first_packet = self.connection._read_packet() 1202 if first_packet.is_ok_packet(): 1203 self._read_ok_packet(first_packet) File ~/mambaforge/envs/anna_spyglass_env2/lib/python3.9/site-packages/pymysql/connections.py:772, in Connection._read_packet(self, packet_type) 770 if self._result is not None and self._result.unbuffered_active is True: 771 self._result.unbuffered_active = False --> 772 packet.raise_for_error() 773 return packet File ~/mambaforge/envs/anna_spyglass_env2/lib/python3.9/site-packages/pymysql/protocol.py:221, in MysqlPacket.raise_for_error(self) 219 if DEBUG: 220 print(“errno =“, errno) --> 221 err.raise_mysql_exception(self._data) File ~/mambaforge/envs/anna_spyglass_env2/lib/python3.9/site-packages/pymysql/err.py:143, in raise_mysql_exception(data) 141 if errorclass is None: 142 errorclass = InternalError if errno < 1000 else OperationalError --> 143 raise errorclass(errno, errval) OperationalError: (1071, ‘Specified key was too long; max key length is 3072 bytes’) (edited) ```
akgillespie8 commented 1 year ago

Related to this issue: https://github.com/LorenFrankLab/spyglass/issues/453

I think I mistakenly moved this to a discussion, but we should definitely fix this and agree with option 2 if it can be done safely.

akgillespie8 commented 1 year ago

Just noticed that Loren suggested just removing common_backup.py. I think we can at least safely not put it in the init, but we could also just consider removing it all together.

khl02007 commented 1 year ago

@CBroz1 @akgillespie8 I was the one who made the common_backup when I was migrating some tables. I don't think we need those tables anymore and I agree that we should drop them and get rid of common_backup.py

akgillespie8 commented 1 year ago

Same problem is happening in:

CBroz1 commented 1 year ago
  • LFPArtifactDetectionParameters in LFPV1

  • DLCPosSelection

  • ArtifactDetectionSelection

Looks like each of these has long varchars in the primary key, but they're not unique, so I don't know why these in particular cause issues. nwb_file_name: varchar(255) occurs throughout the database.

Given that alter only works on non-primary, we would need to declare intermediate tables upstream of each to inherit as secondary keys on an index, and do database surgery to implement

CBroz1 commented 1 year ago

Paths forward include

  1. Making adjustments to not hit this error, reducing the size of various varchars and reducing the number of primary keys within offending tables
  2. Adding setup instructions for customizing the datajoint docker container
  3. Release our own container on docker hub

I'm in favor of making adjustments to spyglass itself, as this would make it more user-friendly

CBroz1 commented 1 year ago

This remains an issue with the production database but not spyglass itself. I hit some roadblock in a python tree search. Next phase will be to try to edit backup files and restore from there