Instrumental / instrumentald

Instrumental System and Service Daemon
MIT License
14 stars 3 forks source link

Allow MongoDB Atlas URIs, including ssl options #69

Closed jason-o-matic closed 7 years ago

jason-o-matic commented 7 years ago

Previously Atlas URIs didn't work because telegraf couldn't parse the ssl options. Here we add parsing that transforms Atlas URIs into configuration blocks that telegraf understands.

This uses the Mongo ruby driver to parse URIs, but unfortunately requires some shimming which you'll see in the code change.

This also updates the tests to include MongoDB configuration so we can verify we process these into the correct telegraf config. In order to test the telegraf config we had to move away from storing it in a tempfile. The tempfile was getting deleted when a temporary forked process exited, which called the tempfile's finalize method.

The general approach here is to break apart a MongoDB URI into its multiple server components and specify a telegraf config block for each, along with the appropriate ssl config for each.

This also adds another test to ensure instrumentald is running because there are cases where should be_running passes incorrectly (notably whenever the instrumentald init.d script is used because it always exits with a success status, whether or not the process is running).

mediocretes commented 7 years ago

I get a number of

expected "ls: cannot access /tmp/instrumentald_telegraf*: No such file or directory\nwaiting\nls: cannot acces...: No such file or directory\nwaiting\ncat: no_tmp_telegraf_files_found: No such file or directory\n" to include "[[inputs.mongodb]]\n#   ## An array of URI to gather stats about. Specify an ip or hostname\n#   ## ...0000, etc.\n  servers = [\"mongodb://localhost:27017\"]\n  tagexclude = [\"state\", \"host\"]\n\n\n"

On a variety of instance types, I'll keep investigating.

jason-o-matic commented 7 years ago

That means somehow the telegraf config isn't being generated. If the test doesn't output a telegraf config in the error then it means it waited 20 seconds then gave up.

mediocretes commented 7 years ago

Hmm, I'll tear it all the way down and clear my kitchen cache and see what happens.

mediocretes commented 7 years ago

Hmm, still happening, and not in master.

mediocretes commented 7 years ago

My problems were caused by not packaging before testing. script/test fixes this, but I was using kitchen verify directly to avoid running all platforms every time. Jason and I discussed adding an argument to script/test to handle this, but, this totally passes now.