dimitri / pgloader

Migrate to PostgreSQL in a single command!
http://pgloader.io
Other
5.27k stars 536 forks source link

ENUM conversion issue with spaces, commas and parentheses #120

Closed victor987 closed 9 years ago

victor987 commented 9 years ago

In the example detailed later, there are incorrect conversions from a MySQL ENUM column to a PostgreSQL CREATE TYPE ... AS ENUM. I've identified 3 types of wrong conversions: ' 8058' becomes '8058' 'ASCII(TAR)' becomes 'ASCII(TAR' 'ASCII,EXCE' becomes 'ASCII', 'EXCE' I did not get a warning, I just got a error because it would result in trying to create a duplicate key (ASCII) so I suppose this issues could go unnoticed. I don't know if there is the same issue when copying the data, I will try to test.

I have this ENUM column definition in MySQL (certainly dirty, sorry): ENUM(' 1509', ' 8058', ' BACKUP', ' EBCDIC', ' SEGY', '"""ACSII,E', '"""WORD,EX', '.....', '16B', '3480', '556', '80 BYTES', '8015', '8052', '???', 'ACSII', 'ADT', 'ARC', 'ARCHIVE', 'ARCHIVI', 'ARCO', 'ASC', 'ASCCI,WOR', 'ASCCI.EXCE', 'ASCI', 'ASCII', 'ASCII EXCE', 'ASCII(TAR)', 'ASCII,EXCE', 'ASCII,MS', 'ASCII,WOR', 'ASCII-EXCE', 'ASCII-LAS', 'ASCII-TAR', 'ASCII-XLS', 'ASCII.EXCE', 'ASCII.EXCL', 'BACKUP', 'BCK', 'BIT', 'BIT/TAP', 'CAFDRS', 'CGG', 'CGM', 'CGM,SPS', 'CLI', 'CLI-ZIP', 'COP4', 'CPI0', 'CPIO', 'CSV', 'DAT', 'DES', 'DGN', 'DISC', 'DLIS', 'DOC', 'DOC,PPT', 'DOC-LAS', 'DOC-XLS', 'DWG', 'DWG-DXF', 'DXF', 'EBCDIC', 'EPR', 'ETF', 'EXC-ASCII', 'EXC-WORD', 'EXCEL,MS', 'EXCEL-SPS', 'EXCEL-UKOA', 'EXCEL/ASCI', 'EXE', 'FOCUS', 'LAS', 'LAS /ASCII', 'LAS-ASCII', 'LAS-PDS', 'LAS-TIF', 'LAS-TIFF', 'LAS-ZIP', 'LAS/ASCII', 'LAS/BACKUP', 'LAS/TIF', 'LIS', 'LIS /LAS', 'LIS & PDS', 'LIS /TAR', 'LIS-LAS', 'LIS/TAP', 'LIS/TAR', 'LSP-TXT', 'MACT', 'MEM', 'MEM-LAS', 'MS EXCEL', 'NTI', 'ONY', 'P190-TXT', 'P294-P190', 'P294-TXT', 'PCX', 'PDF&TXT', 'PDS', 'PDS-FMA', 'PHOENIX', 'RESULTS', 'ROWGEN', 'SAY', 'SED-D', 'SED-G', 'SEG D-8058', 'SEG-A', 'SEG-B', 'SEG-C', 'SEG-D', 'SEG-D 3480', 'SEG-D 8015', 'SEG-D 8024', 'SEG-D 8048', 'SEG-D 8058', 'SEG-D IEEE', 'SEG-D IGGG', 'SEG-D-8015', 'SEG-D-8048', 'SEG-D-8058', 'SEG-D0015', 'SEG-D388', 'SEG-D80', 'SEG-D8015', 'SEG-D8024', 'SEG-D8038', 'SEG-D8048', 'SEG-D8058', 'SEG-X', 'SEG-Y', 'SEG-Y 4744', 'SEG-Y-TAR', 'SEG-Y/TIFF', 'SEG/P1', 'SEGB', 'SEGC', 'SEGD', 'SEGD 8058', 'SEGD -8058', 'SEGD 8015', 'SEGD 8048', 'SEGD 8058', 'SEGD-8015', 'SEGD-8032', 'SEGD-8058', 'SEGD8015', 'SEGD8024', 'SEGD8048', 'SEGD8058', 'SEGP3', 'SEGX', 'SEGY', 'SG-B', 'SG-C', 'SHIVA', 'SP1', 'SP3', 'SPS', 'SPS,UKOOA', 'SUN', 'SVY-SAY', 'SYS-LAS', 'TAP', 'TAPE IMAGE', 'TAR', 'TAR -P1/90', 'TAR /LAS', 'TAR-UKOOA', 'TELEMETRY', 'TFS', 'TIF', 'TIF /LAS', 'TIF-LAS', 'TIF/LAS', 'TIF/TAR', 'TIFF', 'TIFF-LIS', 'TIFF-TAR', 'TIFF-UK4', 'TIFF/LAS', 'TXF', 'TXT', 'TXT-LAS', 'TXT/DAT', 'TXT/XLS', 'UK4', 'UKOA', 'UKOOA', 'UKOOA P-76', 'UKOOA-SPS', 'UNIX-TAR', 'UPHOLE', 'VAR', 'VARIOS', 'VISS', 'WC-4', 'WC1', 'WC4', 'WCI', 'WES4', 'WESCOD1', 'WESCOD4', 'WESCOD?', 'WGC4', 'WISDOM', 'WORD,UKOOA', 'XLS', 'XLS,ASCII', 'XLS,DWG', 'XLS-SAY', 'XTF', 'XTT', 'ZIP', 'ZIP-CLI', 'ZIP-XLS', 'ZIP/CLI', 'ZIP/TIF')

And when connecting directly to the MySQL database (I did not try other methods) pgloader tries to create this type: CREATE TYPE its_format AS ENUM ('1509', '8058', 'BACKUP', 'EBCDIC', 'SEGY', '"""ACSII', 'E', '"""WORD', 'EX', '.....', '16B', '3480', '556', '80 BYTES', '8015', '8052', '???', 'ACSII', 'ADT', 'ARC', 'ARCHIVE', 'ARCHIVI', 'ARCO', 'ASC', 'ASCCI', 'WOR', 'ASCCI.EXCE', 'ASCI', 'ASCII', 'ASCII EXCE', 'ASCII(TAR', 'ASCII', 'EXCE', 'ASCII', 'MS', 'ASCII', 'WOR', 'ASCII-EXCE', 'ASCII-LAS', 'ASCII-TAR', 'ASCII-XLS', 'ASCII.EXCE', 'ASCII.EXCL', 'BACKUP', 'BCK', 'BIT', 'BIT/TAP', 'CAFDRS', 'CGG', 'CGM', 'CGM', 'SPS', 'CLI', 'CLI-ZIP', 'COP4', 'CPI0', 'CPIO', 'CSV', 'DAT', 'DES', 'DGN', 'DISC', 'DLIS', 'DOC', 'DOC', 'PPT', 'DOC-LAS', 'DOC-XLS', 'DWG', 'DWG-DXF', 'DXF', 'EBCDIC', 'EPR', 'ETF', 'EXC-ASCII', 'EXC-WORD', 'EXCEL', 'MS', 'EXCEL-SPS', 'EXCEL-UKOA', 'EXCEL/ASCI', 'EXE', 'FOCUS', 'LAS', 'LAS /ASCII', 'LAS-ASCII', 'LAS-PDS', 'LAS-TIF', 'LAS-TIFF', 'LAS-ZIP', 'LAS/ASCII', 'LAS/BACKUP', 'LAS/TIF', 'LIS', 'LIS /LAS', 'LIS & PDS', 'LIS /TAR', 'LIS-LAS', 'LIS/TAP', 'LIS/TAR', 'LSP-TXT', 'MACT', 'MEM', 'MEM-LAS', 'MS EXCEL', 'NTI', 'ONY', 'P190-TXT', 'P294-P190', 'P294-TXT', 'PCX', 'PDF&TXT', 'PDS', 'PDS-FMA', 'PHOENIX', 'RESULTS', 'ROWGEN', 'SAY', 'SED-D', 'SED-G', 'SEG D-8058', 'SEG-A', 'SEG-B', 'SEG-C', 'SEG-D', 'SEG-D 3480', 'SEG-D 8015', 'SEG-D 8024', 'SEG-D 8048', 'SEG-D 8058', 'SEG-D IEEE', 'SEG-D IGGG', 'SEG-D-8015', 'SEG-D-8048', 'SEG-D-8058', 'SEG-D0015', 'SEG-D388', 'SEG-D80', 'SEG-D8015', 'SEG-D8024', 'SEG-D8038', 'SEG-D8048', 'SEG-D8058', 'SEG-X', 'SEG-Y', 'SEG-Y 4744', 'SEG-Y-TAR', 'SEG-Y/TIFF', 'SEG/P1', 'SEGB', 'SEGC', 'SEGD', 'SEGD 8058', 'SEGD -8058', 'SEGD 8015', 'SEGD 8048', 'SEGD 8058', 'SEGD-8015', 'SEGD-8032', 'SEGD-8058', 'SEGD8015', 'SEGD8024', 'SEGD8048', 'SEGD8058', 'SEGP3', 'SEGX', 'SEGY', 'SG-B', 'SG-C', 'SHIVA', 'SP1', 'SP3', 'SPS', 'SPS', 'UKOOA', 'SUN', 'SVY-SAY', 'SYS-LAS', 'TAP', 'TAPE IMAGE', 'TAR', 'TAR -P1/90', 'TAR /LAS', 'TAR-UKOOA', 'TELEMETRY', 'TFS', 'TIF', 'TIF /LAS', 'TIF-LAS', 'TIF/LAS', 'TIF/TAR', 'TIFF', 'TIFF-LIS', 'TIFF-TAR', 'TIFF-UK4', 'TIFF/LAS', 'TXF', 'TXT', 'TXT-LAS', 'TXT/DAT', 'TXT/XLS', 'UK4', 'UKOA', 'UKOOA', 'UKOOA P-76', 'UKOOA-SPS', 'UNIX-TAR', 'UPHOLE', 'VAR', 'VARIOS', 'VISS', 'WC-4', 'WC1', 'WC4', 'WCI', 'WES4', 'WESCOD1', 'WESCOD4', 'WESCOD?', 'WGC4', 'WISDOM', 'WORD', 'UKOOA', 'XLS', 'XLS', 'ASCII', 'XLS', 'DWG', 'XLS-SAY', 'XTF', 'XTT', 'ZIP', 'ZIP-CLI', 'ZIP-XLS', 'ZIP/CLI', 'ZIP/TIF');

I'm using: pgloader version "3.1.0" compiled with SBCL 1.2.2.debian

victor987 commented 9 years ago

Would the issue with the comma be related to the MySQL SET type, where commas are not allowed?

dimitri commented 9 years ago

ACK: this is a bug, I will work on fixing it soon, will reproduce the problem with your ENUM definition.

Note: I think that function needs to grow some smarts now:

(defun explode-mysql-enum (ctype)
  "Convert MySQL ENUM expression into a list of labels."
  ;; from: "ENUM('small', 'medium', 'large')"
  ;;   to: ("small" "medium" "large")
  (mapcar (lambda (x) (string-trim "' )" x))
      (sq:split-sequence #\, ctype  :start (position #\' ctype))))
dimitri commented 9 years ago

Please try a new fresh compile from sources, it should just work for you now.