Smile-SA / gdpr-dump

Utility that creates anonymized database dumps (MySQL only). Provides default config templates for Magento, Drupal and Shopware.
GNU General Public License v3.0
184 stars 47 forks source link

Improve default `net_buffer_length` #118

Closed staabm closed 8 months ago

staabm commented 8 months ago

Is your feature request related to a problem? Please describe. I am researching ways to make the gdpr-dump run faster for big databases. (running https://github.com/Smile-SA/gdpr-dump/commit/e4770ce210be8420ab1d420453ed86406775bcc8)

I tried different net_buffer_length settings. the default is 1000000 (1 Megabyte) and I tried with 10000000 (10 Megabyte), which seem to improve performance quite a bit.

when using 10 Megabyte it looks like the progressbar roughly updates once every 2 seconds, which I think is still fast enough

Describe the solution you'd like I feel the default of 1000000 is pretty conservative and a higher default could improve dump performance in a default setting quite a bit

guvra commented 8 months ago

Strange, I did the same test last week on a Magento database and the dump time was exactly the same (~1m16 with 1000000 and 10000000). What results did you get?

staabm commented 8 months ago

my export runs a few hours. lets see how it goes when doing a full project dump

staabm commented 8 months ago

the following tests create a dump of ~2,6GB file-size (bz2 compressed)

10 Mbyte

mstaab@mst22:/www/www/gdpr-dump$ php bin/gdpr-dump config.yaml -vvv > dump.sql.bz2
Database settings
  - Dsn: mysql:host=192.168.54.10:33065;dbname=mydb
  - Using password: yes

Dump settings
  - output: php://stdout
  - add_drop_database: false
  - add_drop_table: true
  - add_drop_trigger: true
  - add_locks: true
  - complete_insert: false
  - compress: bzip2
  - default_character_set: utf8
  - disable_keys: true
  - events: false
  - extended_insert: true
  - hex_blob: false
  - init_commands: []
  - insert_ignore: false
  - lock_tables: false
  **- net_buffer_length: 10000000**
  - no_autocommit: true
  - no_create_info: false
  - routines: true
  - single_transaction: true
  - skip_comments: false
  - skip_definer: false
  - skip_dump_date: false
  - skip_triggers: false
  - skip_tz_utc: false

 405/405 [============================] 100% - z_102_9235_return_shipment_log (598 rows) - 1 hr, 14 mins - 44.0 MiB

vs.

1 Mbyte

mstaab@mst22:/www/www/gdpr-dump$ php bin/gdpr-dump config.yaml -vvv > dump.sql.bz2
Database settings
  - Dsn: mysql:host=192.168.54.10:33065;dbname=mydb
  - Using password: yes

Dump settings
  - output: php://stdout
  - add_drop_database: false
  - add_drop_table: true
  - add_drop_trigger: true
  - add_locks: true
  - complete_insert: false
  - compress: bzip2
  - default_character_set: utf8
  - disable_keys: true
  - events: false
  - extended_insert: true
  - hex_blob: false
  - init_commands: []
  - insert_ignore: false
  - lock_tables: false
  **- net_buffer_length: 1000000**
  - no_autocommit: true
  - no_create_info: false
  - routines: true
  - single_transaction: true
  - skip_comments: false
  - skip_definer: false
  - skip_dump_date: false
  - skip_triggers: false
  - skip_tz_utc: false

Dump progress
 405/405 [=========] 100% - z_102_9235_return_shipment_log (598 rows) - 1 hr, 14 mins - 46.0 MiB

I had initially a feeling it is faster, but my testing has the same result as yours.