GoogleCloudPlatform / opentsdb-bigtable

Apache License 2.0
28 stars 23 forks source link

Problem creating metrics #4

Open chroth7 opened 5 years ago

chroth7 commented 5 years ago

Disclaimer: I was following the attached tutorial

Potential bug report

Summary

I think the hosted container of opentsdb-bigtable has a bug, when creating more than 128 metrics, as descirbed here https://github.com/OpenTSDB/opentsdb/issues/1002

Details

I was using the existing container for opentsdb-write and opentsdb-read (hosted on gcr, linked in the deployments), as described in the tutorial. https://github.com/GoogleCloudPlatform/opentsdb-bigtable/blob/master/deployments/opentsdb-write.yaml#L27

From the start, I had issues when writing large amounts of metrics (high cardinality) and ran debugging sessions on the cluster, bigtable, the pods, GCP dataflow (where the data originates), etc. The errors I got did not make sense, also the official mailing list could not help me.

Now, I think i was able to narrow it down:

Note: the bug occured exactly around the time this repo was created, and thus was probably in the repo as you pulled it for the container (as you pull from github to build the container, see also here https://github.com/GoogleCloudPlatform/opentsdb-bigtable/issues/1).

Reproduction

  1. just follow the tutorial, create empty Bigtable instance, and fire up cluster and opentsdb pods
  2. log in to a pod
  3. execute tsdb mkmetric metric{1,2,3} etc for various metrics

Expected: no problems Result: 129th metric fails

See also here: https://groups.google.com/forum/#!topic/opentsdb/aRbG4tmcwy8

Solution

New docker container should fix it (I built my own version of 2.4 in the process, and no problems anymore). It is now super fast and performs as expected.

samizuh commented 5 years ago

Thanks for bringing this up! Will be updating the image per your recommendation.

ScottMcCormack commented 5 years ago

Hi @samizuh . Just wanted to know if the opentsdb-bigtable container been updated for this tutorial or if there is another reference container we should be using fom gcr.io?

@chroth7 just wanted to confirm - were you able to resolve this by rebuilding the Dockerfile in https://github.com/GoogleCloudPlatform/opentsdb-bigtable/blob/master/build/Dockerfile ?

It looks as though the Dockerfile builds from the master branch of https://github.com/OpenTSDB/opentsdb so it should be taking the latest build of OpenTSDB

chroth7 commented 5 years ago

hi @ScottMcCormack

sorry for slow answer

tbh, I built my own container. But yes, I would probably not point at a moving target such as master.

did you build with the Dockerfile in the repo?

ScottMcCormack commented 5 years ago

Hey @chroth7, no problem at all.

Was able to build the container but have had some problems with using it to initialize the Bigtable instance using the jobs/opentsdb-init.yaml file. In particular, it creates all the tables with the exception of the tsdb table.

I haven't figured out the reason for this, so have been using the community opentsdb-bigtable container for the time being.

mowczare commented 5 years ago

@ScottMcCormack The reason of this was backwards incompatible TTL: 'FOREVER' flag located in create_table.sh file of latest OpenTSDB version. I've spent some time to debug through all of this, so I'll be happy to possibly save someone's time in future.

Here is the version with OpenTSDB 2.4.0, HBase 2.2.12 and aforementioned quickfix: https://github.com/GoogleCloudPlatform/opentsdb-bigtable/pull/6

ScottMcCormack commented 5 years ago

Thanks @mowczare I actually worked through debugging this myself a couple of days ago and discovered the TTL => 'FOREVER' argument in create_table.sh was the problem that I was having as well.

Any idea why the TTL => 'FOREVER' flag causes problems on Bigtable? Is it an incompatible argument to creating tables on Bigtable?