InfluxCommunity / influxdb-ruby

Ruby client for InfluxDB
MIT License
370 stars 133 forks source link

Slow client #167

Closed ahmgeek closed 8 years ago

ahmgeek commented 8 years ago

Hello, why this script insert only roughly values in influx from the 100 ? where's the remaining metrics that should be logged, Also about realtime insertion.. I use it from rails, so with some actions I try to log some data, so I assumed that it will be logged in real time with out any loss.

require 'influxdb'

connection = {
  host: 'localhost',
  username: 'test',
  password: 'test123',
  database: 'user_statistics'
}
influxdb = InfluxDB::Client.new(connection)
influxdb.create_database('user_statistics')

# Enumerator that emits a sine wave
Value = (0..360).to_a.map {|i| Math.send(:sin, i / 10.0) * 10 }.each

100.times do |i|
  puts i
  data = {
    values: { value: Value.next, hits: 1 },
    tags:   { wave: 'sine' } # tags are optional
  }

  influxdb.write_point('impressions', data)
end
dmke commented 8 years ago

There are two mechanics at work here:

  1. By default, the Ruby client sends data with "second precision" (as that's what Ruby's Time class would give you).
  2. InfluxDB server identifies records by a unique tuple of time and tags (in this case time and "wave=sine".

Let's assume, you've got no network delay, and calls to InfluxDB::Client#write_point are instantaneous: Your 100.times { ... } loop now creates 100 times the following data (this is not the actual data sent over the wire, but a close approximation):

Notice, how time + tags are identical in each row. In effect, only the last value is stored, as each row would overwrite the previous one.

In a more realistic scenario, when you have to deal with latency issues, you'd have two different sets:

  1. time=1475438003,wave=sine hits=1,value=... around 50 times
  2. time=1475438004,wave=sine hits=1,value=... around 50 times as well

To get around this, you need to define the time_precision in your client config:

connection = {
  # same as above, plus:
  time_precision: "ns"
}
ahmgeek commented 8 years ago

thanks a lot, the bug remained in my actual app until I removed the time_stamp = timestamp: timestamp || Time.current.to_i so it's always Time.current.to_i from ruby, that always dropped some series from being logged, so I stopped using it. is there any hints about using this specific field, I know if I didn't provided it, it will be logged automatically, but just in case. thanks again for your hints :+1: :kissing_heart:

dmke commented 8 years ago

You could either connect with time_precision: "ns" and provide a Nano-second time value ((Time.now.to_r * 10**9).to_i) or... just omit it completely and rely on the server to fill this in automatically.

You shouldn't omit the timestamp though, if you need high-precision time data (i.e. data skew introduced by network latency is a no-go) of if you cache the writes and send them later in-bulk (like with the async writer).