balena-labs-projects / connector

Auto-configured data connector block based on Telegraf
15 stars 8 forks source link

Race condition on first startup #2

Closed chrisys closed 4 years ago

chrisys commented 4 years ago

If the transport primitive starts before InfluxDB the database creation times out and is not retried.

Screenshot 2020-07-24 at 16 56 32

phil-d-wilson commented 4 years ago

I wonder if there is much we can do here @chrisys , other than using "depends_on" in your docker compose for the transport service?!?

phil-d-wilson commented 4 years ago

Tested "depends_on" in docker compose, but that doesn't cause transport to wait until the InfluxDB server has started running - only that the service is running. Might need to look into other options of polling certain outputs to find when they are genuinely up, before even starting telegraf.

phil-d-wilson commented 4 years ago

I can't recreate this so far. My tests steps:

  1. Stop InfluxDB
  2. Change the name of the target DB in transport with the INFLUXDB_DB variable
  3. Watch transport failing to write to new DB
  4. Wait <=5 minutes
  5. Start InfluxDB

This resulted in telegraf continuing to try and write to the DB (as sensor data kept coming in) until it eventually succeeded.

My hunch is that @chrisys did not have a valid data source (issue #3 solves this) and therefore telegraf wasn't continuously prompted to try and create the DB. With a valid source, I think this will work as expected.

phil-d-wilson commented 4 years ago

Closing as I can't recreate this, as long as you have a valid data source.