NSLS-II / nslsii

NSLS-II related devices
BSD 3-Clause "New" or "Revised" License
11 stars 23 forks source link

Add loggers aimed at Elasticsearch #48

Open danielballan opened 5 years ago

danielballan commented 5 years ago

We need a Python logging handler that submits data to Elastic. Specifically it should submit at POST request like this:

curl -X "PUT" cmb03.cs.nsls2.local:9200/SOME_SENSIBLE_INDEX_NAME/_doc/1 -d '{"hello": "world"}' -H "Content-Type: application/json" 

There may already be a nice library of StackOverflow snippet for making HTTP requests from Python loggers. If not, I would just roll something using requests.

mrakitin commented 5 years ago

+1 for requests. Also, should the authentication be supported?

mrakitin commented 5 years ago

In my case I had to add http:// to the address to have it passed, not refused.

danielballan commented 5 years ago

Moving security discussion to a private channel.

mrakitin commented 5 years ago

This seems to work:

In [1]: import requests

In [2]: topic = 'test'

In [3]: r = requests.put(f'http://cmb03.cs.nsls2.local:9200/{topic}/_doc/1', json={'hello': 'DAMA tester'})

In [4]: r
Out[4]: <Response [200]>

In [5]: r.text
Out[5]: '{"_index":"test","_type":"_doc","_id":"1","_version":3,"result":"updated","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":2,"_primary_term":1}'
danielballan commented 5 years ago

Good. As a note to future readers, the URL http://cmb03.cs.nsls2.local:9200/ should now be http://elasticsearch.cs.nsls2.local/. The old one may not work.

Also, refer to this Dropbox Paper for suggested index names: https://paper.dropbox.com/doc/Kafka-Topics--AT8pvSkZTo5zs_yP40M6R5HFAg-Kedt0QGwc0DhH9cZzXkDy

danielballan commented 5 years ago

We should wrap the usage demonstrated by in @mrakitin's comment above in a Python logging handler. Something like:

class ElasticHandler(logging.Handler):
    def __init__(self, url):
        self.url = url
        super().__init__()

    def emit(self, record):
        # Extract useful info from the record and put them into a dict.
        response = requests.put(...)
        # Raise an exception if the server we PUT to returns a bad status code.
        # The logging framework will catch the error and print a message.
        response.raise_for_status()

Read up on LogRecord to understand what to expect in record.

Test like so:

from bluesky import RunEngine
RE = RunEngine({})
handler = ElasticHandler(URL)
RE.log.addHandler(handler)
RE.log.setLevel("DEBUG")
RE([])  # Run an empty plan for simplicity's sake. Will still generate some log messages.
danielballan commented 5 years ago

@ke-zhang-rd If you are interested in getting involved with the Kibana stuff, this might be a good issue for you to work on. We don't need it for the deployment, but we wish we had it for the deployment and will roll it out to select beamlines as soon as it's ready.

Let us know if you are interested; if not I expect @mrakitin or I will be happy to take it.

danielballan commented 5 years ago

As a separate but related issue that may be tackled in the same PR, we agreed in our meeting today that we would have both an ElasticHandler and a simpler RotatingFileHanlder while we experiment. We may remove one or the other depending on how things go.

mrakitin commented 5 years ago

Awesome, I like that the RotatingFileHandler is very configurable, so we can fully control how large the logs should be and how many of them to keep, and we should experiment with number at each particular beamline and for each particular package we are logging (e.g. caproto may produce 100x amount of data than bluesky, even with an idle IPython session).