elastic / call-for-meetups

17 stars 6 forks source link

Using Elasticsearch as the Primary Data Store #9

Closed vy closed 4 years ago

vy commented 5 years ago

Description

The biggest e-commerce company in the Netherlands and Belgium, bol.com, set out on a 4 year journey to rethink and rebuild their entire ETL (Extract, Transform, Load) pipeline, that has been cooking up the data used by its search engine since the dawn of time. This more than a decade old white-bearded giant, breathing in the dungeons of shady Oracle PL/SQL hacks, was in a state of decay, causing ever increasing hiccups on production. A rewrite was inevitable. After drafting many blueprints, we went for a Java service backed by Elasticsearch as the primary storage! This idea brought shivers to even the most senior Elasticsearch consultants hired, so to ease your mind I’ll walk you through why we took such a radical approach and how we managed to escape our legacy.

A paragraph describing your proposal

Unless Lucene suffices your need, Elasticsearch is the best F/OSS search engine in the wild, hands down. People obviously use it for search purposes. But not many use it for primary storage purposes, which is the selling point of my talk. bol.com already has a pretty complicated search infrastructure. This time I want to talk about how we leverage Elasticsearch to realize the ETL layer, which is the heart pumping blood to search. (To avoid any confusions: I will not talk about the Elasticsearch used for search. I will talk about the Elasticsearch used for ETL. These are two totally different clusters for different purposes.)

Speaker Bio

Volkan has been working as a Java plumber in the domain of e-commerce search since 2014. In addition to his daily rescue trips to the land of "Java-based reactive microservices flavored by Spring, Elasticsearch, and RDBMS hazards", he enjoys performing public service for Log4j, Reactor, and OpenJDK Project Loom. His spare time (read as "nights") is mostly occupied by the maintenance of certain Log4j plugins and J2EE record-and-replay suites. Prior to that, you could have found him coding for embedded devices, sending patches to PostgreSQL, implementing data structures in Lisp, and developing distributed software-defined network (SDN) controllers. He holds an internationally accredited Permanent Head Damage, aka, PhD.

Short bio of the speaker

Volkan has been working as a Java plumber in the domain of e-commerce search since 2014.

xeraa commented 5 years ago

@bleskes this one is for you and the Amsterdam meetup :)

bleskes commented 5 years ago

@vy thanks for submitting this. It sounds very interesting. Two things I wanted to clarify:

1) Do want to host it at bol.com? we did so in the past with great success. 2) You mention in the title that the talk is using ES as a primary data store, but later that the talk is about using it for ETL. Can you clarify?

vy commented 5 years ago

Thanks for taking time to reply @bleskes.

  1. Yes, we can. Wouldn't it be better in an Amsterdam-based location which, to the best of my knowledge, is more accessible by a wider community?
  2. I did not totally get your question but let me see if I can further clarify my proposal. We are using Elasticsearch like an RDBMS and leverage its consistency guarantees, like transactions in an RDBMS. We have a write heavy load, which is again unconventional. The domain we use it is ETL. Elasticsearch is our primary data store to cook documents that are later on pushed to other Elasticsearch clusters which serve actual search queries from customers.
  3. The long story is told in the following blog post: Using Elasticsearch as the Primary Data Store
  4. I had already made a similar presentation in CNCML Vienna, 2019. The slides of it are also accessible in the blog post.
bleskes commented 5 years ago

@vy thanks. That helps and indeed answers my question. I think this is great and we can talk about having you speak at our next meetup. This can be a month or two out because we already have one scheduled next week. Are you ok with that time frame?

vy commented 5 years ago

Sure. I just need to attend to JCrete between 14-22 July. Other than that, nothing concrete yet, hence the rest is fine with me.

bleskes commented 5 years ago

we're flexible so it'll be fine.

vy commented 5 years ago

@bleskes, is there any progress on the issue?

vy commented 4 years ago

@xeraa Is there any other way to get in touch with @bleskes?

bleskes commented 4 years ago

@vy sorry - I was on vacation during the previous ping and I'm still catching up. Let me pick this up.

bleskes commented 4 years ago

@vy meetup published on https://www.meetup.com/Elastic-NL/events/265086114/ . Thanks and see you there!

oeph commented 4 years ago

Is there any chance, that anyone could provide a short wrap-up or any notes of the meeting?

bleskes commented 4 years ago

Meetup was yesterday. Great turn out (~90 people) and good presentations. Let me know if you need more

oeph commented 4 years ago

Are there any slides available or results on any discussions? I saw something like this on previous meetups

vy commented 4 years ago

@oeph, the full story and slides of my presentation are available in a blog post.