elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.7k stars 24.66k forks source link

Add a live streaming API? #55358

Open jpountz opened 4 years ago

jpountz commented 4 years ago

Elasticsearch is often used to index logs and live-tailing the logs that match a given filter is a common use-case, but I think we could greatly improve the user experience here. The current approach is to periodically run a query that sorts hits by descending @timestamp and use a couple tricks to make these requests run efficiently.

But this approach generally delivers messages out-of-order: it's likely that a request returns for the first time an event that is older than the most recent event returned by the previous request. This is mostly due to how we partition data into shards:

Would it be possible to build an API that, assuming that events get pushed to Elasticsearch in order, would be able to live-stream events in order as well?

elasticmachine commented 4 years ago

Pinging @elastic/es-search (:Search/Search)

jasontedor commented 4 years ago

@jpountz When I was thinking about the changes API, one use-case that I thought for our own products was exactly the Logs application and tailing logs there. I'm curious if you've thought about this in that context as well?

jpountz commented 4 years ago

@jasontedor I have thought about it indeed. I don't think that it will be solved entirely by the Changes API because I feel like global ordering by @timestamp is important for the user experience, and I'm not seeing global ordering as a feature of the Changes API. But building on top of the Changes API might be convenient. Please let me know if you had different expectations.

We don't need the entire feature set of the Changes API, e.g. I don't think we would need to be informed about deletions so another option might be to use _search and search_after on the _seq_no and/or @timestamp fields at the shard level (both have different pros/cons).

Either way we'd need something on top in order to provide global ordering by @timestamp as much as possible. E.g. I believe that we'll want to ignore events that are too recent because there might be older events that are not visible yet because they are still indexing or not refreshed yet, these documents would only be returned on a following page.

jpountz commented 4 years ago

We discussed it today as a group. This generally felt useful, and while both _search and the Changes API could be building blocks for this functionality, the Changes API is a more natural fit:

This raises interesting questions that we'll need to think about:

jpountz commented 4 years ago

Depends on #1242

weltenwort commented 4 years ago

Thanks for considering this :tada:

While it makes total sense not to duplicate the effort for both APIs I would consider one property pretty important: It should be possible to achieve a consistent in both the changes API as well as _search. Is that realistic?

The reason is that the latter is probably still going to be used when fetching log entries for past time intervals.

jpountz commented 4 years ago

@weltenwort The idea would be that whatever we end up exposing would take care of fetching log entries for past intervals too. The problem with _search is that it can't guarantee ordering across pages (it only guarantees it within a single page), so either a later page would include events that are older than some events from previous pages, or it would mistakenly ignore some logs if search_after is used.

weltenwort commented 4 years ago

That sounds like it would solve the search_after tiebreaker problem for us :heart_eyes: Let me know if you want to validate any API design choice in regard to the Logs UI use case early in the process.

jpountz commented 4 years ago

We'll certainly reach out when we start tackling this issue!

elasticsearchmachine commented 1 year ago

Pinging @elastic/es-search (Team:Search)

elasticsearchmachine commented 2 months ago

Pinging @elastic/es-search-foundations (Team:Search Foundations)