Closed GoogleCodeExporter closed 9 years ago
This feature is now implemented. Here is a description including notes on
testing.
1. Filter implementations. This feature adds two filters as described in the
feature description.
schemachange - A Java filter to detect schema changes by parsing statements and
create permanent annotations on the statements. This filter also detects
truncate statements. Here is the general behavior.
a.) Any statement that affects the logical schema of the table, such as
CREATE/DROP SCHEMA, CREATE/DROP TABLE, or ALTER causes the annotation
##schema_change to be added to the event metadata.
b.) Similarly any statement that is a table TRUNCATE operate causes the
annotation ##truncate to be added to the event metadata.
c.) In both cases the operation type, schema name, and table name (if
appropriate) are added to the statement metadata.
monitorschemachange - A Javascript filter to log changes detected by the
schemachange filter and optionally force a commit when such a change is
processed.
2. Installation. Filters can be installed anywhere that is convenient with the
caveat that schemachange must run before monitorschemachange for the latter to
work. Here is a sample installation.
a.) Install schemachange on master as in the following example. This way the
schema annotations are visible in the log.
tools/tpm install batch1 \
...
--svc-extractor-filters=colnames,pkey,schemachange \
...
--start-and-report
b.) Install monitorschemachange on the slave applier stage. This logs the
changes on the slave and causes a commit only in the final stage.
tools/tpm install batch1 \
...
--svc-applier-filters=monitorschemachange \
...
--start-and-report
3. To test schemachange filter operation, enter the following types of
statements:
CREATE SCHEMA bar
DROP SCHEMA bar
CREATE TABLE foo
DROP TABLE foo
ALTER TABLE foo ADD COLUMN data1 varchar(30)
ALTER TABLE foo DROP COLUMN data1
All of these should generate annotations on transactions stored in the THL like
the following example for ALTER TABLE. Note the 'schema_change' annotation on
the transaction and the ##operation, ##schema, and ##table on the statement.
SEQ# = 2 / FRAG# = 0 (last frag)
- TIME = 2014-05-05 21:57:24.0
- EPOCH# = 0
- EVENTID = mysql-bin.000287:0000000000000639;70
- SOURCEID = logos1
- METADATA =
[mysql_server_id=1;unsafe_for_block_commit;dbms_type=mysql;service=batch;shard=t
est;schema_change]
- TYPE = com.continuent.tungsten.replicator.event.ReplDBMSEvent
- OPTIONS = [##charset = UTF-8, autocommit = 1, sql_auto_is_null = 0,
foreign_key_checks = 1, unique_checks = 1, sql_mode = 'NO_ENGINE_SUBSTITUTION',
character_set_client = 33, collation_connection = 33, collation_server = 33,
##schema = test, ##table = foo, ##operation = ALTER TABLE]
- SCHEMA =
- SQL(0) = alter table test.foo add column data1 varchar(30)
You can also enter a truncate table command, which should generate similar
annotations:
SEQ# = 3 / FRAG# = 0 (last frag)
- TIME = 2014-05-05 21:59:35.0
- EPOCH# = 0
- EVENTID = mysql-bin.000287:0000000000000719;71
- SOURCEID = logos1
- METADATA =
[mysql_server_id=1;dbms_type=mysql;service=batch;shard=test;truncate]
- TYPE = com.continuent.tungsten.replicator.event.ReplDBMSEvent
- OPTIONS = [##charset = UTF-8, autocommit = 1, sql_auto_is_null = 0,
foreign_key_checks = 1, unique_checks = 1, sql_mode = 'NO_ENGINE_SUBSTITUTION',
character_set_client = 33, collation_connection = 33, collation_server = 33,
##schema = test, ##table = foo, ##operation = TRUNCATE]
- SCHEMA =
- SQL(0) = truncate test.foo
4.) To test monitorschemachange operation, look for the following:
a.) The slave replicator log should have messages like the following for schema
change and truncate commands:
INFO | jvm 1 | 2014/05/05 22:01:06 | 2014-05-06 05:01:06,422 [batch -
q-to-dbms-0] INFO filter.JavaScriptFilter SCHEMA CHANGE: seqno=2 schema=test
table=foo operation=ALTER TABLE sql=[alter table test.foo add column data1
varchar(30)]
The default behavior of the monitorschemachange filter is to force a schema
change, which results in the following message:
INFO | jvm 1 | 2014/05/05 22:09:30 | 2014-05-06 05:09:30,551 [batch -
q-to-dbms-0] INFO filter.JavaScriptFilter Forcing commit due to schema change:
seqno=2
Original comment by robert.h...@continuent.com
on 6 May 2014 at 5:13
This issue was closed by revision r2219.
Original comment by robert.h...@continuent.com
on 9 May 2014 at 12:31
Original comment by robert.h...@continuent.com
on 9 May 2014 at 1:47
To be more useful this implementation should include the following capability:
1.) For each schema change that may require data to be refreshed from the
source, generate a file in a designated directory. This file should have the
following properties.
a.) The name has the form 'schema.table'
b.) It is in a subdirectory that matches the replicator service name.
2.) The file is generated for truncate, drop/create table. It may need to be
generated for alter as well. (This should be parameterizable?)
For Hadoop implementations we can then write a script that checks these files
and re-sqoops source data automatically, after which the file can be cleared.
Original comment by robert.h...@continuent.com
on 16 May 2014 at 3:39
The Oracle applier needs to be updated to at least replicate TRUNCATE
statements. Ideally any DDL could be replicated. There could be an option to
enable that portion.
Original comment by jeff.m...@continuent.com
on 16 May 2014 at 1:06
It might be useful if this basic structure was adaptable to other database
targets.
For example, the content of the file could be a CSV defining schema/table and
change type:
sales.invoices,alter
sales,alter
This information could then be read and/or imported into a destination database
to track for changes.
For example, with Vertica/InfiniDB we could use and parse this information to
more easily identify when a ddlscan update is required.
Original comment by mc.br...@continuent.com
on 16 May 2014 at 1:42
The schemachange and monitorschemachange filters are adapted as follows to
catch changes from pt-online-schema-change and generate suitable notifications.
Here is a summary of the behavior.
1. The schemachange now generates THL annotations when there is a RENAME TABLE
command. They appear as follows. This allows us to pick up the "schema
change" that occurs when a pt-online-schema-change completes successfully.
SEQ# = 20 / FRAG# = 0 (last frag)
...
- OPTIONS = [##charset = ISO8859_1, autocommit = 1, sql_auto_is_null = 0,
foreign_key_checks = 1, unique_checks = 1, sql_mode =
'NO_ENGINE_SUBSTITUTION,NO_AUTO_VALUE_ON_ZERO', character_set_client = 8,
collation_connection = 8, collation_server = 33, ##schema = test, ##table =
foo, ##operation = RENAME TABLE]
- SCHEMA = test
- SQL(0) = RENAME TABLE `test`.`foo` TO `test`.`_foo_old`, `test`.`_foo_new` TO
`test`.`foo`
2. The monitorschemachange filter has a notify property that enable generating
of notification files. To enable this, add the following property to tpm
commands:
--property=replicator.filter.monitorschemachange.notify=true
When set to true, the replicator generates notification files of the form
'schema.table' for table changes and 'schema.-none-' for schema changes (e.g.
drop/create). Here is an example of the file generated for a rename:
# Schema change notification
sourceId=logos1
seqno=112896
commitTimestamp=2014-05-31 17:42:52.0
schema=test
table=schema_change
operation=RENAME TABLE
sql=RENAME TABLE `test`.`schema_change` TO `test`.`_schema_change_old`, `test`.`
_schema_change_new` TO `test`.`schema_change`
3. Schema change files are generated by default in directory
tungsten-replicator/log/schemachanges/<service>. You can change this using the
notifyDir property, as in the following example:
--property=replicator.filter.monitorschemachange.notifyDir=/tmp/myschemachanges
To test these features just enable both schemachange and monitorschemafilters.
You can see the pt-online-schema-change behavior using a script like the
following:
mysql -uroot <<SQL
drop table if exists test.schema_change;
create table test.schema_change (id int primary key, data1 varchar(20));
insert into test.schema_change values(1, 'one');
SQL
pt-online-schema-change --user tungsten --password secret --alter 'add column
ts1 timestamp' --execute D=test,t=schema_change
Original comment by robert.h...@continuent.com
on 31 May 2014 at 6:04
Original comment by robert.h...@continuent.com
on 31 May 2014 at 6:11
This issue is just the beginning of efforts to track schema changes accurately.
The current implementation solves one specific use case, namely to provide a
reasonably accurate indication when it is necessary to resqoop data to Hadoop
following a schema change in an upstream MySQL server.
We can add additional improvements for other DBMS types as well as for Hadoop
loading based on field experience and specific use cases.
Original comment by robert.h...@continuent.com
on 31 May 2014 at 6:14
In the low-hanging-fruit department, would it be a good idea to add the seqNo
to the notify file name?
The reasoning for this addition is that we process CSV files using files made
of table name + seqNo. If the notify files did include a seqNo, both these
files could be used for automated operations outside of the existing tools.
Would such addition clash with what we are using so far?
Original comment by g.maxia
on 17 Jun 2014 at 12:45
In fact, if we don't add a seqNo to the table name, every DDL operation on such
table will overwrite the previous operation.
CREATE TABLE sentinel ...
ALTER TABLE sentinel ...
TRUNCATE sentinel
The above statements will result in only one notify file, containing the latest
command.
Is this behavior intentional?
Original comment by g.maxia
on 17 Jun 2014 at 1:11
Adding the seqno is a good idea. I can do this as part of final cleanup if
others agree.
Original comment by robert.h...@continuent.com
on 28 Aug 2014 at 4:54
The seqno issue looks useful to field users--I had a call today that discussed
this. I have committed a fix so that each generated file has the following
form:
<schema>.<table>.<seqno>
This makes it easy for scripts to identify changes quickly. At this point I
believe the implementation is complete.
Original comment by robert.h...@continuent.com
on 28 Aug 2014 at 11:58
The implementation is complete.
This feature behaves as described.
There is no explicit test for it yet.
Original comment by g.maxia
on 18 Sep 2014 at 11:34
A note has been added to the 3.0 release notes
Original comment by mc.br...@continuent.com
on 20 Oct 2014 at 1:29
Original issue reported on code.google.com by
robert.h...@continuent.com
on 14 Apr 2014 at 5:36