embulk / embulk-output-jdbc

MySQL, PostgreSQL, Redshift and generic JDBC output plugins for Embulk
Other
88 stars 86 forks source link

Should be able to create more than 1000 temporary tables #299

Closed civitaspo closed 2 years ago

civitaspo commented 2 years ago

Hi maintainers,

When we use modes other than INSERT_DIRECT and MERGE_DIRECT in embulk-output-jdbc (or plugins that inherit from embulk-output-jdbc), there is a restriction that we cannot create more than 1000 temporary tables as follows.

ref. https://github.com/embulk/embulk-output-jdbc/blob/7a433dc/embulk-output-jdbc/src/main/java/org/embulk/output/jdbc/AbstractJdbcOutputPlugin.java#L698-L704

This will cause problems when the number of tasks exceeds 1000, such as when there are more than 1000 input files.

I think this restriction was designed to prevent the maximum number of table names from being exceeded, but I think the problem can be solved if the number of suffix digits can be specified as an option. What do you think?

civitaspo commented 2 years ago

This PR https://github.com/embulk/embulk-output-jdbc/pull/289 maybe approaches this issue.

civitaspo commented 2 years ago

I think it would be better to create an option temp_table_suffix_length than the PR approach.

hito4t commented 2 years ago

@civitaspo Thank you for reporting the issue!

I think a temporary table name suffix length, now fixed as 3, should be decided by the number of digits of (taskCount - 1).

hiroyuki-sato commented 2 years ago

JFYI: table name length limits. We need to care those limitations.

DB Name limit Reference
MySQL 64 characters 9.2.1 Identifier Length Limits
PostgreSQL 63 bytes Appendix K. PostgreSQL Limits
SQL Server 128 bytes? SQL SERVER – Maximum Allowable
Redshift 127 bytes CREATE TABLE
Oracle
DB2
dmikurube commented 2 years ago

@hito4t's idea sounds good as :

  1. Existing users has no impacts without any configuration change.
  2. Users who were not able to use this because of 1000+ tables will be able to start using it (although the table name extends).

The point that I was worried about #289 was the table name length limits (thanks @hiroyuki-sato), and existing users could be affected.

civitaspo commented 2 years ago

@hito4t Thanks for the reply. I think your idea is a good one too. I'll go ahead with the implementation.