dbeaver / dbeaver

Free universal database tool and SQL client
https://dbeaver.io
Apache License 2.0
38.52k stars 3.33k forks source link

Netezza Fast import #23138

Open KrzysztofDusko opened 4 months ago

KrzysztofDusko commented 4 months ago

Is your feature request related to a problem? Please describe. Import from xlsx/csv to netezza system should use external tabels not insert row by row because of performance reasons. Describe the solution you'd like Ntezaaa specific implementation of import is needed

Describe alternatives you've considered In my opinion there is no alternative. nzload should also be fast but is not accessible to most

Additional context I can help with implemetation details - if needed

LonwoLonwo commented 3 months ago

Hello @KrzysztofDusko

Thanks for the request.

Quick research didn't return specific information about fast/bulk load in Netezza. So, if you have any helpful links - please add them to the ticket body.

KrzysztofDusko commented 3 months ago

https://www.ibm.com/docs/en/netezza?topic=tables-create-external-table-command https://www.ibm.com/docs/en/netezza?topic=tables-loading-data-from-remote-client-system https://www.ibm.com/docs/en/netezza?topic=tables-additional-examples https://www.ibm.com/docs/en/netezza?topic=options-option-summary https://dwgeek.com/netezza-unload-table-using-external-tables-examples.html/

open option is to create external table (metadata "link" to file /named pipe = emp_backup = external table name) /tmp/emp.dat = real file or pipe path

CREATE EXTERNAL TABLE emp_backup SAMEAS emp USING (
DATAOBJECT ('/tmp/emp.dat')
REMOTESOURCE 'ODBC');
INSERT INTO emp SELECT * FROM emp_backup;

SAMEAS emp = data file have structure suited to existing emp table

another option is to use this syntax

INSERT INTO EXISTING_TABLE_NAME SELECT * FROM EXTERNAL '\\.\pipe\NAMED_PIPE.NAME'  -- pipe or real file
(
COL1 BIGINT,
COL2 NVARCHAR(50)
)
USING(
    REMOTESOURCE 'DOTNET' -- jdbc, odbc, ...
    DELIMITER '\t'
    SKIPROWS 1
    NULLVALUE ''
    ENCODING 'utf-8'
    ESCAPECHAR '\'
    TIMESTYLE '24HOUR'
    MAXERRORS 0
    LOGDIR 'C:\log_path'
);

both options are way way faster than insert row by row (or inserty by select... union all select...). It's important to be careful with \r, \n, colu delimiter values and how to escape then (netezza specific format is simmilar to csv but its not RFC 4180 compilant)

LonwoLonwo commented 3 months ago

Thanks for the explanation.