ghost commented 9 years ago

This issue is here to throw the idea in the air.

This could possibly include other RDBMS systems (generic database output plugin), but being selfish, I'm focusing on postgres first.

purbon commented 9 years ago

This could be named an JDBC/ODBC output so you don't have to focus on a given database :-)

bgiromini commented 9 years ago

Yes. Since rivers are being deprecated we need to see about using Logstash to perform the same function. I attempted a few weeks ago to start a Sql Server version, but gave up. Basically I want the functionality of the JDBC river plugin which allows for complex nested queries. I guess since it is Logstash is written in JRuby you could integrate both. I want to provide one sql statement and then have it run on some sort of schedule.

I few months ago Elasticsearch came to my area and I spoke with them about the complete lack of a easy ETL tool for moving data from a relational db to ES. It didn't seems like anything was in the works but they did say watch for the release of 1.5.

purbon commented 9 years ago

This issue relates to:

https://logstash.jira.com/browse/LOGSTASH-2138 (MySQL query input)
https://logstash.jira.com/browse/LOGSTASH-1764 (Generic JDBC input, that can subscribe to changes).

all about different ways to have a generic input from a relational database.

talevy commented 9 years ago

Hey All,

I am starting work on the JDBC input and thinking of modeling it after the existing JDBC Elasticsearch River.

here is the prototype: https://github.com/talevy/logstash-input-jdbc-prototype

Please let me know which features you would like to see. For example, would you want to supply a SQL query to fetch from the database, or do you want to be able to migrate a whole database out of the box via dumping all tables into Elasticsearch.

dougmcclure commented 9 years ago

We've just built one as well we plan to contribute once our "paperwork" clears. I'll let my peers know about this and they might have some things to share feature wise.

On Tue, Feb 10, 2015 at 7:04 PM, Tal Levy notifications@github.com wrote:

Hey All,

I am starting work on the JDBC input and thinking of modeling it after the existing JDBC Elasticsearch River https://github.com/jprante/elasticsearch-river-jdbc.

here is the prototype: https://github.com/talevy/logstash-input-jdbc-prototype

Please let me know which features you would like to see. For example, would you want to supply a SQL query to fetch from the database, or do you want to be able to migrate a whole database out of the box via dumping all tables into Elasticsearch.

— Reply to this email directly or view it on GitHub https://github.com/elasticsearch/logstash/issues/2476#issuecomment-73811278 .

talevy commented 9 years ago

@dougmcclure awesome! I am just starting work on this today, so If you have any use-cases that you can share, that would be great!

Also, when do you think the "paperwork" will clear? I figure we can consolidate once you are ready

thanks!

wiibaa commented 9 years ago

@talevy I'm also interested in following up the discussion / development. I'm convinced that we should be able to define a jdbc-mixin to work on general capabilities (connection pool, transaction...) I already tried in my work-in-progress filter (see below). Ping me if I can help

Additional sources for jdbc related plugins (just a github search on "logstash jdbc")

outputs:

https://github.com/theangryangel/logstash-output-jdbc

inputs:

filter

https://github.com/wiibaa/logstash-filter-jdbc (that's me ;) )

purbon commented 9 years ago

Nice, very happy to see lots of people working on the same direction.

@dougmcclure Looking forward to see what you have build, as @talevy said It would be awesome to have as many options as possible or might be to collaborate in one very powerful one ... looking forward to see your use cases.

@wiibaa not sure if we want to go all the path to create our own mixin, we've things like Sequel (https://github.com/jeremyevans/sequel) that came out like a perfect, and for now wild use, Ruby database access layer ... I specially like it because it makes this input "driver agnostic", and in on the long run we can also make it "jruby agnostic" ...

In my case I started yesterday morning (https://github.com/purbon/logstash-input-jdbc) as an exercise, and experiment on how to build something alike to a nice JDBC generic driver. I would for example like to add a jdbc driver to Sequel to enable databases like MonetDB (https://github.com/MonetDB) or things like Exasol (http://www.exasol.com/en/) with a lot of usage in Germany.

@talevy Feature wise, I understand we should at less provide something like:

Snapshot and Delta alike imports from a given table. (as now the sqlite input does)
To be able to provide a more or less complex SQL query. This might let the user to wrap up some joins in order to clean up, or pre process some event data internally. (I guess this is a low requirement as it might be very tricky).
Would be nice to provide the user with an option to select what he understand by PK, that have to have some kind of order. Like this we don't force the user to have an autoincrement ID, but we can have timestamps for example.

here there are some of my ideas :smile:, happy to discuss and follow that discussion.

Looking forward to it.

dadoonet commented 9 years ago

Let's thrown some ideas I really in JDBC river:

You can inject in your SQL statement some special values. This is really handy for incremental updates:
- $now - the current timestamp
- $last.sql.start - a timestamp value for the time when the last SQL statement started
- $last.sql.end - a timestamp value for the time when the last SQL statement ended
max_rows: limit the number of rows fetches by a statement, the rest of the rows is ignored
Structured Objects
Column naming

Having a JDBC LS input is really straight forward I think for flat data, such as logs. It could become tricky to use such an input for complex objects. I mean that if in your application you need to run more than one single select operation to create your object, you will need more than one single SQL call to create this object again in logstash. The Structured Objects approach starts to help with that but might not be enough.

I wonder if at some point we could have also a JDBC filter to enhance our data with the result of another SQL request. Could be cacheable BTW if you imagine that you could load at LS startup a reference table from the database and then only use the cache.

bgiromini commented 9 years ago

I really like the structured objects approach. What I do currently is provide a complex nested query with all the objects I need. Using a ORM tool (Dapper.net) I transform the query into code and then push the data to ES using Elasticsearch.Nest. Obviously this requires writing code every time I want to move data or add new indexes.

wiibaa commented 9 years ago

@purbon no strong feeling about a mixin, I tried to separate the code just in case I would experiment different approaches (Sequel, ActiveRecords, commons-dbcp.jar...) but Sequel seems really nice indeed...I'm still a newbie to ruby jruby ecosystem and libraries

About MonetDB, did you already identify a need to write a sequel adapter for it ?? Because it seems to work out of the box for simple cases as JDBC is already a big abstraction layer. The only point is that you need to load the driver class yourself as mentionned in Sequel JDBC documentation

For all other databases, the Java class should be loaded manually before calling Sequel.connect.

So in my case it would look like

begin 
      Sequel::JDBC.load_driver(@jdbc_driver_class)
    rescue Sequel::AdapterNotFound => e
      #Maybe exotic driver with no specific sequel::adapter (MonetDB)
      Java::JavaClass.for_name(@jdbc_driver_class)
    end
    @database = Sequel.connect(@jdbc_connection_string, :user=> @jdbc_user, :password=>  @jdbc_password.nil? ? nil : @jdbc_password.value)

Here is a sample filter config I quickly tested (after creating a dummy table in the demo database)

filter {
  jdbc {
    jdbc_driver_library => "D:/dev/tools/monetdb-jdbc-2.14.jar"
    jdbc_driver_class => "nl.cwi.monetdb.jdbc.MonetDriver"
    jdbc_connection_string => "jdbc:monetdb://localhost/demo"
    jdbc_user => "monetdb"
    jdbc_password => "monetdb"
    statement => "SELECT * from my_table"
    target => "new_field"
  }
}

dadoonet commented 9 years ago

@wiibaa I totally missed your jdbc filter. That's awesome!

talevy commented 9 years ago

@wiibaa got a chance to work with the mixin today, works great! I will work with it a bit more and get back to you with possible suggestions or questions. We should definitely consider splitting it out into its own gem so it can be used by multiple plugins (input, filter, output). What are your thoughts on making specific plugins for each adapter so that people can get up and running with popular databases without managing jar files.

@dadoonet the Structured Object is totally doable. Although, I have some questions about it.

It seems to take a result set and group rows together that match some specific identifiers. This forms an array of items in the object that will then be indexed. If rows are being retrieved iteratively, and there is data that share the same _id in future runs, aren't we indexing incomplete objects?

Just trying to see a real-world use-case, we may be able to have a general logstash filter that accomplishes this Structured Object behavior.

wiibaa commented 9 years ago

@talevy I have an initial idea about specific plugin per database, need to think out some details first, but this would only works for jdbc drivers available either as gems or in maven central, so it excludes for sure oracle due to licensing restriction about redistribution (and surely others), maybe the jdbc-mixin or logstash should provide an additonal classpath folder to load any jar present in a path, old discussion in https://github.com/elasticsearch/logstash/pull/1301

EDIT: just linking to https://github.com/elasticsearch/logstash/pull/2445 to remind me that using custom maven repo would be feasible for internal-use gems. To be investigated...

wiibaa commented 9 years ago

@talevy Here is an example of a jdbc-mysql-filter wrapping the "generic" jdbc filter https://github.com/wiibaa/logstash-filter-jdbc_mysql/blob/master/lib/logstash/filters/jdbc_mysql.rb Do you think it is worth to be done ?

talevy commented 9 years ago

@wiibaa That was what I had in mind, yeah. nice! I think it is pretty good

robmckeown commented 9 years ago

We've created a working JDBC input plugin - currently very much focused upon our immediate needs of simple flat table extraction ( code here https://github.com/IBM-ITOAdev/LogstashPlugins/blob/master/logstash/inputs/genjdbc.rb ). I literally just gave it the first good workout this afternoon, pulling a few hundred MB from one of our DBs, so it's hot off the press! Once we've shaken it out a bit, I'm happy to circle back and add some of the capabilities mentioned in the thread above, or otherwise morph this into a community artifact

Typical config :

genjdbc { jdbcHost => 'x.x.x.x' # Host Name of Database Server jdbcPort => 'xxxx' # Port Name of Database Server jdbcTargetDB => 'mssql' # Target DB Type (Required) jdbcDBName => 'xxxx' # Database Name jdbcUser => 'xxxx' # Database User Name jdbcPassword => 'xxxx' # Database User Password jdbcDriverPath => '/path/to/my/driver/sqljdbc4.jar'

jdbcSQLQuery => "select * from AlarmEvent_View where TimeStamp > '%{CURRENTTIME}' order by Timestamp ASC"

jdbcSQLQuery => 'file:./myQuery.sql' # Can put a query in a file instead

jdbcTimeField => 'TimeStamp' # The column in your result table that contains a timestamp (Important) \ jdbcPollInterval => '10' # Retry Query Time in Seconds. jdbcPStoreFile => '/path/to/my/store/file/my.pstore' # to remember where we left off over Logstash restarts

If not set it will start collecting from the localhost current time. \

jdbcCollectionStartTime => '2015-01-01 00:00:00.00' }

nullbuddy1243 commented 9 years ago

I would love to see this as a feature. Currently, in order to get SQL JDBC query output into Logstash, I have to run an Elasticsearch JDBC river, index the query output in ES, then have a logstash elasticsearch input monitor and scoop-up updates of that index, filter it to add valid timestamps, and store in a different index.
Very messy, but it works (sorta).
Consider me a mighty +1 on this feature.

jumpinjan commented 9 years ago

I have a definite need for a jdbc input plugin... if someone would like me to help with testing their plugin - please let me know.... I would be using to collect data from an Oracle database as part work I do in a Performance/Load testing lab

jumpinjan commented 9 years ago

Oh forgot to mention - my ugly workaround for this is to merge the oracle jdbc jar with the jruby complete jar and then write a jruby plugin to handle the sql queries

dougmcclure commented 9 years ago

Give this a go. I know we've used it with Oracle data.

https://github.com/IBM-ITOAdev/logstash-input-genjdbc

On Tue, Apr 14, 2015 at 2:48 PM, Janice Gluck notifications@github.com wrote:

I have a definite need for a jdbc input plugin... if someone would like me to help with testing their plugin - please let me know.... I would be using to collect data from an Oracle database as part work I do in a Performance/Load testing lab

— Reply to this email directly or view it on GitHub https://github.com/elastic/logstash/issues/2476#issuecomment-93016842.

jumpinjan commented 9 years ago

Thanks! I will try it out tomorrow and post feedback.

Janice

Janice Gluck

[cid:image001.png@01CF858C.32055680]http://www1.good.com/[http://media.www1.good.com/images/icon-em-sig-facebook.png]https://www.facebook.com/goodformobile[http://media.www1.good.com/images/icon-em-sig-twitter.png]https://twitter.com/GOOD_technology[http://media.www1.good.com/images/icon-em-sig-linkedin.png]https://www.linkedin.com/company/good-technology[http://media.www1.good.com/images/icon-em-sig-youtube.png]https://www.youtube.com/user/GoodForMobile

From: Doug McClure notifications@github.com<mailto:notifications@github.com> Reply-To: elastic/logstash reply@reply.github.com<mailto:reply@reply.github.com> Date: Tuesday, April 14, 2015 at 3:20 PM To: elastic/logstash logstash@noreply.github.com<mailto:logstash@noreply.github.com> Cc: Janice Gluck jgluck@good.com<mailto:jgluck@good.com> Subject: Re: [logstash] Feature request: JDBC output/input plugins (#2476)

Give this a go. I know we've used it with Oracle data.

https://github.com/IBM-ITOAdev/logstash-input-genjdbc

On Tue, Apr 14, 2015 at 2:48 PM, Janice Gluck notifications@github.com<mailto:notifications@github.com> wrote:

I have a definite need for a jdbc input plugin... if someone would like me to help with testing their plugin - please let me know.... I would be using to collect data from an Oracle database as part work I do in a Performance/Load testing lab

— Reply to this email directly or view it on GitHub https://github.com/elastic/logstash/issues/2476#issuecomment-93016842.

— Reply to this email directly or view it on GitHubhttps://github.com/elastic/logstash/issues/2476#issuecomment-93026015.

jumpinjan commented 9 years ago

I tried the genjdbc plugin with the oracle ojdbd6.jar driver and it worked very nicely for me. I only had to make a slight modification because I'm not using the jdbcTimeField.

Thank you very much Doug!

dougmcclure commented 9 years ago

Great to hear! If you think of any improvements let us know. We hope to get this into an official logstash plugin soon.

On Wed, Apr 15, 2015 at 10:28 AM, Janice Gluck notifications@github.com wrote:

I tried the genjdbc plugin with the oracle ojdbd6.jar driver and it worked very nicely for me. I only had to make a slight modification because I'm not using the jdbcTimeField.

Thank you very much Doug!

— Reply to this email directly or view it on GitHub https://github.com/elastic/logstash/issues/2476#issuecomment-93422597.

jumpinjan commented 9 years ago

I would modify it so that the plugin does not assume that the jdbcTimeField is being used. For my purposes I don't need that feature so I commented out lines 155 thru 161 and just made escapedQuery = originalQuery

jumpinjan commented 9 years ago

My use case is that I'm getting a count of all the rows in a table so no need for a timestamp

jumpinjan commented 9 years ago

Ok... I just took my next step and tried to put the event created from the plugin into GELF. I had to tweak the plugin one more time because the event message was empty. Could you guys tweak the plugin to put out the data to the message field as well - perhaps as a comma delimited list or key value pairs? It was an easy change for me since I'm only getting back one row with one column in it

dougmcclure commented 9 years ago

Hi Janice - thanks for the feedback. Can you open these as requests on the git hub page for this plugin? https://github.com/IBM-ITOAdev/logstash-input-genjdbc/issues That will help us keep this thread alive.

Doug

On Wed, Apr 15, 2015 at 11:16 AM, Janice Gluck notifications@github.com wrote:

Ok... I just took my next step and tried to put the event created from the plugin into GELF. I had to tweak the plugin one more time because the event message was empty. Could you guys tweak the plugin to put out the data to the message field as well - perhaps as a comma delimited list or key value pairs? It was an easy change for me since I'm only getting back one row with one column in it

— Reply to this email directly or view it on GitHub https://github.com/elastic/logstash/issues/2476#issuecomment-93443450.

jumpinjan commented 9 years ago

Will do..

Also forgot to mention…

Thanks for the great documentation made it very easy to get up and running…

Janice

Janice Gluck

[cid:image001.png@01CF858C.32055680]http://www1.good.com/[http://media.www1.good.com/images/icon-em-sig-facebook.png]https://www.facebook.com/goodformobile[http://media.www1.good.com/images/icon-em-sig-twitter.png]https://twitter.com/GOOD_technology[http://media.www1.good.com/images/icon-em-sig-linkedin.png]https://www.linkedin.com/company/good-technology[http://media.www1.good.com/images/icon-em-sig-youtube.png]https://www.youtube.com/user/GoodForMobile

From: Doug McClure notifications@github.com<mailto:notifications@github.com> Reply-To: elastic/logstash reply@reply.github.com<mailto:reply@reply.github.com> Date: Wednesday, April 15, 2015 at 3:57 PM To: elastic/logstash logstash@noreply.github.com<mailto:logstash@noreply.github.com> Cc: Janice Gluck jgluck@good.com<mailto:jgluck@good.com> Subject: Re: [logstash] Feature request: JDBC output/input plugins (#2476)

Hi Janice - thanks for the feedback. Can you open these as requests on the git hub page for this plugin? https://github.com/IBM-ITOAdev/logstash-input-genjdbc/issues That will help us keep this thread alive.

Doug

On Wed, Apr 15, 2015 at 11:16 AM, Janice Gluck notifications@github.com<mailto:notifications@github.com> wrote:

Ok... I just took my next step and tried to put the event created from the plugin into GELF. I had to tweak the plugin one more time because the event message was empty. Could you guys tweak the plugin to put out the data to the message field as well - perhaps as a comma delimited list or key value pairs? It was an easy change for me since I'm only getting back one row with one column in it

— Reply to this email directly or view it on GitHub https://github.com/elastic/logstash/issues/2476#issuecomment-93443450.

— Reply to this email directly or view it on GitHubhttps://github.com/elastic/logstash/issues/2476#issuecomment-93549106.

biswajit86 commented 9 years ago

Hi,

1 .Is this plugin going to be released soon as an official logstash input plugin or is it still a WIP.

Also, is there a way to define your own stream size or batch size to support result sets from really huge tables

talevy commented 9 years ago

Released officially supported release of jdbc input plugin!

bin/plugin install logstash-input-jdbc

usage example:

input {
   jdbc {
     jdbc_driver_library => "mysql-connector-java-5.1.36-bin.jar"
     jdbc_driver_class => "com.mysql.jdbc.Driver"
     jdbc_connection_string => "jdbc:mysql://localhost:3306/mydb"
     jdbc_user => "mysql"
     parameters => { "favorite_artist" => "Beethoven" }
     schedule => "* * * * *"
     statement => "SELECT * from songs where artist = :favorite_artist"
   }
 }

talevy commented 9 years ago

reopened for JDBC output plugin request

nullbuddy1243 commented 9 years ago

That is such cool news. Looking forward to testing this.

suyograo commented 9 years ago

We have no plans for JDBC output for now. Closing this

elastic / logstash

Feature request: JDBC output/input plugins #2476

jdbcSQLQuery => 'file:./myQuery.sql' # Can put a query in a file instead

If not set it will start collecting from the localhost current time. \