github / gh-ost

GitHub's Online Schema-migration Tool for MySQL
MIT License
12.26k stars 1.25k forks source link

Missing --binary-mode alike parameter? Tool fails with FATAL Error 1366: Incorrect string value. #342

Open nethalo opened 7 years ago

nethalo commented 7 years ago

The tool fails trying to run one of the REPLACE statements that have an URL with a character

Error message: FATAL Error 1366: Incorrect string value: '\xEF\xBF\xBDEve...' for column 'url' at row 1

Values: args=[1480391318431_7062357157387106246_ACT_B_3522379_S_1005_EUIDS_NIL_EUIDE_EXCHG_3 2016-11-29 03:48:38 +0000 UTC <nil> RHEoWesPcC2cKQJ2KeSoMFyz http://www.viralvinny.com/21-places-earth-youre-not-allowed-visit/?utm_source=outbrain&utm_medium=LAZ_ViralVinny.com-Mobile-US-ForbiddenB&utm_content=hole&utm_campaign=21 Places Youre Not Allowed To Visit�Ever&utm_term=5333981 612475 US US-FL KISSIMMEE 534 14 -1 -1 0.5 -1 -1 4857156 1005 3522379 S_FEE 389 <nil> <nil> ip_address 172.58.169.187 1 0 3 -1 <nil> 5246 3226465175 <nil> <nil> <nil> <nil> <nil> <nil> 7010 7477 FWC0 SVM0 SVM_-1000 -1 -1 <nil> dc4ad-rtbbidder08.int.dealer.com <nil> 34744 <nil> <nil> -300 1610 en -1 -1 -1 false_NIL false_0 NIL_NIL_NIL_0.0.0 <nil> <nil> 612475 3 1]

shlomi-noach commented 7 years ago

Can you please share:

thank you

nethalo commented 7 years ago

Command

./gh-ost \
  --user="percona" \
  --password="XXXXX" \
  --host=localhost \
  --test-on-replica \
  --database="esm_impression_history_prod" \
  --table="rtb_impressions_and_clicks" \
  --alter="ROW_FORMAT = COMPRESSED" \
  --initially-drop-ghost-table \
  --initially-drop-old-table \
  --max-load=Threads_running=30 \
  --chunk-size=500 \
  --cut-over=default \
  --exact-rowcount \
  --concurrent-rowcount \
  --serve-socket-file=/tmp/gh-ost.test.sock \
  --panic-flag-file=/tmp/gh-ost.panic.flag \
  --postpone-cut-over-flag-file=/root/wait-cutover.file \
  --execute 

Table structure:

Create Table: CREATE TABLE `rtb_impressions_and_clicks` (
  `exchange_request_id` varchar(200) NOT NULL,
  `datetime` timestamp NULL DEFAULT NULL,
  `googleUserID` varchar(45) DEFAULT NULL,
  `esmUserID` varchar(50) DEFAULT NULL,
  `url` varchar(600) DEFAULT NULL,
  `anonymousId` varchar(50) DEFAULT NULL,
  `country` varchar(45) DEFAULT NULL,
  `region` varchar(45) DEFAULT NULL,
  `city` varchar(45) DEFAULT NULL,
  `metro` int(11) DEFAULT NULL,
  `vertical1` int(11) DEFAULT NULL,
  `vertical2` int(11) DEFAULT NULL,
  `vertical3` int(11) DEFAULT NULL,
  `vertical1Weight` float DEFAULT NULL,
  `vertical2Weight` float DEFAULT NULL,
  `vertical3Weight` float DEFAULT NULL,
  `bid_amount` int(10) DEFAULT NULL,
  `strategy_id` smallint(6) DEFAULT NULL,
  `bannerid` mediumint(9) DEFAULT NULL,
  `strategy_param1` varchar(50) DEFAULT NULL,
  `strategy_value1` varchar(50) DEFAULT NULL,
  `strategy_param2` varchar(50) DEFAULT NULL,
  `strategy_value2` varchar(50) DEFAULT NULL,
  `strategy_param3` varchar(50) DEFAULT NULL,
  `strategy_value3` varchar(50) DEFAULT NULL,
  `processing_time` mediumint(9) DEFAULT NULL,
  `analyzed` binary(1) NOT NULL DEFAULT '0',
  `slot_visibility` smallint(6) DEFAULT '-1',
  `hours_since_last_visit` int(11) DEFAULT '-1',
  `advertiser_categories` varchar(100) DEFAULT NULL,
  `decrypted_winning_price` int(11) DEFAULT NULL,
  `auto_id` bigint(20) NOT NULL AUTO_INCREMENT,
  `click1datetime` timestamp NULL DEFAULT NULL,
  `click2datetime` timestamp NULL DEFAULT NULL,
  `click3datetime` timestamp NULL DEFAULT NULL,
  `click4datetime` timestamp NULL DEFAULT NULL,
  `click5datetime` timestamp NULL DEFAULT NULL,
  `first_return_datetime` timestamp NULL DEFAULT NULL,
  `advertiserid` int(11) NOT NULL DEFAULT '0',
  `campaignid` int(11) NOT NULL DEFAULT '0',
  `classifiers_triggered` varchar(80) DEFAULT NULL,
  `num_pages_browsed_last_k_days` int(11) NOT NULL DEFAULT '-1',
  `time_spent_last_k_days` int(11) NOT NULL DEFAULT '-1',
  `original_pixel_datasource_id_used` varchar(80) DEFAULT NULL,
  `hostname` varchar(50) DEFAULT NULL,
  `config_label` varchar(80) DEFAULT NULL,
  `postal_code` varchar(25) DEFAULT NULL,
  `postal_code_prefix` varchar(25) DEFAULT NULL,
  `advertiser_categories_intent` varchar(50) DEFAULT NULL,
  `timezone_offset` int(11) DEFAULT '0',
  `seller_network` varchar(50) DEFAULT NULL,
  `detected_language` varchar(2) DEFAULT NULL,
  `detected_content_label_1` int(11) DEFAULT '-1',
  `detected_content_label_2` int(11) DEFAULT '-1',
  `detected_content_label_3` int(11) DEFAULT '-1',
  `mobile_app_name` varchar(50) DEFAULT NULL,
  `mobile_company_name` varchar(50) DEFAULT NULL,
  `mobile_platform_name` varchar(50) DEFAULT NULL,
  `mobile_carrier_name` varchar(50) DEFAULT NULL,
  `mobile_app_country` varchar(50) DEFAULT NULL,
  `target_channel` varchar(50) DEFAULT NULL,
  `browser_id` int(11) DEFAULT '-1',
  `target_platform_type` int(11) DEFAULT '-1',
  PRIMARY KEY (`auto_id`),
  KEY `advertiserid_auto_id` (`advertiserid`,`auto_id`),
  KEY `advertiserid_datetime` (`advertiserid`,`datetime`),
  KEY `analyzed_auto_id` (`analyzed`,`auto_id`),
  KEY `auto_idadvertiserid` (`auto_id`,`advertiserid`),
  KEY `datetime_advertiserid` (`datetime`,`advertiserid`),
  KEY `datetime_googleUserID` (`datetime`,`googleUserID`),
  KEY `first_return_datetime_googleUserID_auto_id` (`first_return_datetime`,`googleUserID`,`auto_id`),
  KEY `ind_exchg_req_id` (`exchange_request_id`)
) ENGINE=InnoDB AUTO_INCREMENT=3272644143 DEFAULT CHARSET=latin1 ROW_FORMAT=COMPACT

gh-ost version:

/root/gh-ost --version 1.0.28

Output: I don't have access to it now but is can recall several rows with the "ERROR Error 1366: Incorrect string value" line

shlomi-noach commented 7 years ago

Potentially duplicate of https://github.com/github/gh-ost/issues/290

shlomi-noach commented 7 years ago

It seems like url is a latin1 column. Is said character a utf8 character? Please also refer to https://github.com/github/gh-ost/issues/290#issuecomment-257436498.

The URL at hand is

http://www.viralvinny.com/21-places-earth-youre-not-allowed-visit/?utm_source=outbrain&utm_medium=LAZ_ViralVinny.com-Mobile-US-ForbiddenB&utm_content=hole&utm_campaign=21 Places Youre Not Allowed To Visit�Ever&utm_term=5333981

This shouldn't be stored in a latin1 column in the first place. gh-ost gets upset about this because it tries to write this value onto the latin1 column and gets rejected.

While this may sound like a sorry excuse, I think gh-ost is doing the right thing.

However also noteworthy is that planned work on https://github.com/github/gh-ost/issues/230 may make this behavior go away, as long as migration runs through PRIMARY KEY.

To sum it up, my recommendation to you is that if you're storing utf8 characters, you should be using utf8/utf8mb4 columns.

shlomi-noach commented 7 years ago

230 is moving into a decoupled approach, where the same problem would apply.

I don't have an immediate solution for this, other than suggest converting the url column to utf8mb4. This should work for gh-ost. I wonder if you can test this to confirm:

--alter="change url varchar(600) charset utf8mb4 DEFAULT NULL"