Create tools around the new datasource classes to support common schema operations while the replicator is not running

GoogleCodeExporter commented 9 years ago

1. To which tool/application/daemon will this feature apply?

bin/datasource

2. Describe the feature in general

bin/datasource [-service <svc>] {help|get|set|reset} [-ds <dsname>] [-log]...

The logs for this utility should go into a separate log file in 
tungsten-replicator/log.

The -service argument defines a replication service. If there is only one 
replication service, this is not required.

The -ds argument defines a datasource in the static properties file. This 
should use 'global' by default.

The -log argument tells the tool that all operations should be logged in the 
replication history of the relevant datasource. This will not have an impact if 
the datasource type does not support replication history.

# Delete the tracking tables and schema (If possible)
$> bin/datasource reset

# Return the information from the trep_commit_seqno table. 
$> bin/datasource get
[
    {
      "applied_latency": "0",
      "epoch_number": "42343",
      "eventid": "mysql-bin.000003:0000000011546371;-1",
      "extract_timestamp": "2014-08-22 17:23:56.0",
      "fragno": "0",
      "last_frag": "1",
      "seqno": "100363",
      "shard_id": "tungsten_create_load",
      "source_id": "cdb1",
      "task_id": "0",
      "update_timestamp": "2014-08-22 17:23:56.0"
    }
]

# Recreate the tracking tables and populate the trep_commit_seqno table
# This should drop the tables if they already exist
$> bin/datasource set -seqno ### -epoch ### \
-event-id AAAAAAAAA.######:####### -source-id AAAAA.AAAA.AAAAA

3. Describe the feature interface

4. Give an idea (if applicable) of a possible implementation

5. Describe pros and cons of this feature.

5a. Why the world will be a better place with this feature.

We will just have one place for interacting with these schemas instead of 
duplicating in Ruby.

5b. What hardship will the human race have to endure if this feature is
implemented.

6. Notes

Original issue reported on code.google.com by jeffm...@gmail.com on 22 Aug 2014 at 5:51

Blocking: #950, #1088

GoogleCodeExporter commented 9 years ago

Changes for Issue 766 have laid the foundation for this request.  The key class 
that enables operation on data sources is DataSourceAdministrator.  Currently 
it supports only reset but set/get could easily be added and it would be easy 
to wrap the whole in utility. 

I would prefer a shorter name for the data source utility, such as dsctl.

Original comment by robert.h...@continuent.com on 28 Aug 2014 at 5:00

GoogleCodeExporter commented 9 years ago

How does dsctl relates to bin/query? Seems that responsibilities of both are 
being crossed over and there should be only one tool.

Original comment by linas.vi...@continuent.com on 28 Aug 2014 at 6:45

GoogleCodeExporter commented 9 years ago

How does dsctl relates to bin/query? Seems that responsibilities of both are 
being crossed over and there should be only one tool.

Original comment by linas.vi...@continuent.com on 28 Aug 2014 at 6:45

GoogleCodeExporter commented 9 years ago

bin/dsctl is for managing the replication position of a single service against 
a single dataservice

bin/query is for running a single or set of SQL statements against a JDBC URL.

The work of bin/dsctl could be done using bin/query except for file-system 
based datasources. It would also require a large duplication of effort. By 
using the Java datasources the work only needs to be done once when 
implementing new datasources.

Original comment by jeffm...@gmail.com on 28 Aug 2014 at 11:15

GoogleCodeExporter commented 9 years ago

Original comment by jeff.m...@continuent.com on 8 Sep 2014 at 7:39

Now blocking: #950

GoogleCodeExporter commented 9 years ago

Decided that this can wait for the next release during planning meeting.

Original comment by linas.vi...@continuent.com on 10 Sep 2014 at 12:49

Added labels: FixedIn-3.1.0, FoundIn-3.0.0
Removed labels: FixedIn-3.0.0, Foundin

GoogleCodeExporter commented 9 years ago

Original comment by linas.vi...@continuent.com on 4 Dec 2014 at 11:43

Changed state: Started
Added labels: Priority-Critical
Removed labels: Priority-High

GoogleCodeExporter commented 9 years ago

This issue was updated by revision r2694.

CONT-34

Most of the implementation of dsctl utility, except the 'set' command and -log 
option.

* Notes of Importance *

n1.) Successfully executed commands are logged into dsctl.log, while errors are 
not. The idea behind this is to have a single audit log of all operations on 
position, in case they need to be reviewed. Errors are logged into stderr (see 
bellow) instead. -log option is not implemented at this point. dsctl uses 
log4j-dsctl.properties file, which can be adjusted manually, if needed.

n2.) stderr is used extensively. All errors go into stderr. Scripts that call 
dsctl should check for output there. For example: 'get' command will output 
JSON to stdout, but, upon failure, a message to stderr.

n3.) Exit codes are used too. Non-zero means there was an error while executing 
command. There are different codes for different errors, but they are not 
structured or documented at this point.

* QA Testing Requirements *

At minimum the following QA testing is required at this point:

t1.) Slight possibility for a regression: SqlCommitSeqno position retrieval 
query has been adjusted by adding two fields (task_id and update_timestamp). 
All relational DBMS types should be checked that they still work (i.e. 
Replicator starts up and transactions are correctly updated and read from 
trep_commit_seqno table).

t2.) Tests for 'get' and 'reset' commands for all file-based DBMS types. Both 
master and slaves should be checked. I have tested MySQL and Redshift.

t3.) Test behavior of 'get' and 'reset' on parallel replication with more than 
one channel.

Original comment by linas.vi...@continuent.com on 5 Dec 2014 at 4:27

GoogleCodeExporter commented 9 years ago

The dsctl.log is a really good idea. The -log option was meant to toggle if the 
'set' or 'reset' statement goes into the MySQL binary log. By default, changes 
to the position should not be in the binary log. But in some cases we do want 
that information to go into the binary log. The option may not do anything for 
platforms like Hadoop but having the option is useful.

Original comment by jeff.m...@continuent.com on 5 Dec 2014 at 4:42

GoogleCodeExporter commented 9 years ago

This issue was updated by revision r2695.

CONT-34

Reducing two SQL queries, with same content, into a single one to avoid (and 
fix one that already appeared) bugs.

Original comment by linas.vi...@continuent.com on 5 Dec 2014 at 6:31

GoogleCodeExporter commented 9 years ago

This issue was updated by revision r2696.

CONT-34
Status: QA

`dsctl set` implementation for SQL and file datasources.

* QA Testing Requirements *

t4.) Special care for Hadoop cases of get/reset/set combinations.

t5.) Check that `trepctl status` works after `dsctl set` under file datasources.

t5.) Check that parallel replication works. I.e. that running `reset`, then 
`set` and launching Replicator re-creates all the channels for both SQL and 
file datasources.

Original comment by linas.vi...@continuent.com on 9 Dec 2014 at 4:19

GoogleCodeExporter commented 9 years ago

Original comment by linas.vi...@continuent.com on 9 Dec 2014 at 4:25

Changed state: QA

GoogleCodeExporter commented 9 years ago

With a MySQL datasource the reset function happily removes all the tables from 
the tungsten schema while the replicator is online.

Original comment by eric.har...@continuent.com on 14 Jan 2015 at 3:24

GoogleCodeExporter commented 9 years ago

Set also works when the replicator is online

Error message when a 'set' is attempted and a position exists could be clearer
Message is 'Cannot set position, because tasks already exist - clear position 
first'

Perhaps this should mention the 'reset' command and 'tasks already exist' could 
say something like 'a replication position already exists' ?

Original comment by eric.har...@continuent.com on 14 Jan 2015 at 3:37

GoogleCodeExporter commented 9 years ago

Hi Eric,

dsctl has no way of knowing the true state of Replicator, as it doesn't connect 
via JMX to Replicator. I want it to work without the need to have a running 
Replicator. I suggest we leave it like this for now.

The reasoning behind "tasks" is that in case of parallel replication there 
might be more than one "position" entry, which is identified by a task_id in 
the trep_commit_seqno table, but I agree, we could be less geeky here. Please 
provide a specific suggestion of what message you would like to see.

Original comment by linas.vi...@continuent.com on 14 Jan 2015 at 3:48

GoogleCodeExporter commented 9 years ago

OK, thanks Linas.
How about:
Cannot set position unless existing position data is removed - use reset first

Original comment by eric.har...@continuent.com on 14 Jan 2015 at 4:02

GoogleCodeExporter commented 9 years ago

This issue was updated by revision r2757.

Updated error message when position is being tried to set without removing 
existing one first. I couldn't reference `dsctl` in this context explicitly, 
because the message originates in datasource classes, which do not (and 
shouldn't) know about dsctl. This message is a compromise which should make 
dsctl error more helpful, while still making sense in datasource classes alone.
We do not have error code handling functionality inside of Replicator, which 
would help "translate" messages like these - something to be overhauled in the 
future.

Original comment by linas.vi...@continuent.com on 14 Jan 2015 at 5:41

GoogleCodeExporter commented 9 years ago

Original comment by linas.vi...@continuent.com on 15 Jan 2015 at 5:16

Now blocking: #1088

GoogleCodeExporter commented 9 years ago

This issue was updated by revision r2761.

CONT-194

ATTENTION: Behavioral change!

Setting last_frag to 1 (true) in `dsctl set`. Previously it was left default 
(false), which resulted that after `dsctl set` Replicator started from the 
specified seqno, as opposed to starting from the next one.

Original comment by linas.vi...@continuent.com on 15 Jan 2015 at 7:58

GoogleCodeExporter commented 9 years ago

Receiving NPE when attempting to run a 'set' on a file applier

/opt/replicator/tungsten/tungsten-replicator/bin/dsctl set -seqno 28 -epoch 1 
-event-id 'mysql-bin.000045:0000000000005051;-1' -source-id db1
null
java.lang.NullPointerException
    at com.continuent.tungsten.replicator.datasource.FileCommitSeqno.store(FileCommitSeqno.java:462)
    at com.continuent.tungsten.replicator.datasource.FileCommitSeqno.initPosition(FileCommitSeqno.java:211)
    at com.continuent.tungsten.replicator.datasource.DataSourceAdministrator.set(DataSourceAdministrator.java:159)
    at com.continuent.tungsten.replicator.datasource.DsctlCtrl.doSet(DsctlCtrl.java:285)
    at com.continuent.tungsten.replicator.datasource.DsctlCtrl.main(DsctlCtrl.java:171)

Original comment by eric.har...@continuent.com on 15 Jan 2015 at 4:49

GoogleCodeExporter commented 9 years ago

I should add that 'get' and 'reset' worked correctly

Original comment by eric.har...@continuent.com on 15 Jan 2015 at 4:52

Changed state: Started

GoogleCodeExporter commented 9 years ago

This issue was updated by revision r2766.

When initializing position setting extracted timestamp to current time, as it's 
expected to not be null down the path.

Original comment by linas.vi...@continuent.com on 16 Jan 2015 at 7:14

Changed state: QA

GoogleCodeExporter commented 9 years ago

This issue was updated by revision r2768.

Resolving an NPE that came up while testing `set` on parallel replication: 
extract_timestamp had to be set. Also, setting update_timestamp as well to 
avoid other possible NPEs.

Original comment by linas.vi...@continuent.com on 16 Jan 2015 at 3:16

GoogleCodeExporter commented 9 years ago

According to the design document there should be -log option which is not 
recognised. Has this been dropped?

Original comment by eric.har...@continuent.com on 16 Jan 2015 at 5:09

GoogleCodeExporter commented 9 years ago

Apart from the -log option this works as described now.

Original comment by eric.har...@continuent.com on 16 Jan 2015 at 5:48

Changed state: Documenting

GoogleCodeExporter commented 9 years ago

Yes, -log has been postponed. However, though totally unrelated, there are 
dsctl user logs in logs/ folder :)

Original comment by 777...@gmail.com on 16 Jan 2015 at 6:08

GoogleCodeExporter commented 9 years ago

There won't be a 3.1.0 version number.

Original comment by linas.vi...@continuent.com on 19 Jan 2015 at 2:17

Added labels: FixedIn-4.0.0
Removed labels: FixedIn-3.1.0

epermana / tungsten-replicator

Create tools around the new datasource classes to support common schema operations while the replicator is not running #992