BrouthenKamel commented 11 months ago

Description

Investigate Elastic Search and see:

How it works ?
What does it provide ?
How can it be adapted to our case, (depending on our search flow) ?

Outcome

A concise resource or guide on Elastic Search to be referenced along the project

4zz0u4k commented 11 months ago

https://youtu.be/ZP0NmfyfsoM?si=TAno9zc_3N77_5H6

4zz0u4k commented 11 months ago

NOTE !! : The following explanations are based on what i've seen in Elasticsearch's yt channel , the link bellow is a playlist that describes perfectly all details about how elasticsearch operates , in the mean time you can read this breif explanaition just to get a hold of how elasticsearch works

https://youtube.com/playlist?list=PL_mJOmq4zsHZYAyK606y7wjQtC0aoE6Es&si=UP_KzCgNFeBD5KTU

General OverView on Elasticsearch

Elasticsearch is a free and open search and analytics engine for all types of data , it may act as a DB but its not considered as one, its primal use is to speed up searchs and to query documents based on a score determined by the specification of the query . It is considered to be the heart of the ELK stack (L : logstash , K : kibana)

Logstash is a data processing pipeline. Data that logstash receives(i.e. e-commerce orders & customer messages) are handled as events. These events are parsed, filtered , and transformed and are sent off to Elasticsearch where the data will be stored.

Kibana provides a web interface to the data stored in Elasticsearch. It allows users to send queries to Elasticsearch using the same REST API. These queries can provide answers to questions such as "How many users visit our site daily?" or What was the revenue for last month?"

Basic Arch :

A node is a running instance of Elasticsearch that stores data. It has a unique id and a name.

Each node belongs to a cluster, which is a collection of nodes that are connected together. When we started up a node, a cluster was formed automatically(pink box).

You can add one or many nodes in a cluster. These nodes are distributed across separate machines. A node, by default, are assigned all of the following roles: master-eligible, data, ingest, and machine learning(if available). You can configure these roles and give specific roles to certain nodes.

Each node in the cluster can handle HTTP request from client and communication between nodes. All nodes are aware of fellow nodes within the same cluster and can forward HTTP requests to the node designed to handle the request.

A shard contains a bunch of docs (the JSON documents we store) , And these shards belong to what is called an index , each index groups shards with the same content (something in commun)

the disrebution of data over shards is done precicsly in order to get the best performance and a minimal stoarage + a good response time (the query lunched on shards is in parrallel between the nodes so the more distrebuted are the documents the faster the query is)

in many cases backup shards are created to replace there equivalent shard in case the node goes off (remeber the node is a running machine ! ) , it also good to have duplicates to manage a high number of requests .

How to interact ?

The Elasticseach search engine provides a fully functionnal built in REST API ! , which means doing a CRUD operation requires only a GET/POST/PUT/DELETE request no need for complexe db queries .

a complete guide into the basics of this commands :

https://dev.to/lisahjung/beginner-s-guide-to-performing-crud-operations-with-elasticsearch-kibana-1h0n

this blog includes all the basic stuff from creating an index to adding and deleting a document // bare in mind that i didnt talk about the setup or anything like that , i will mention such things and recources in the standup

An overview of how relevant are the searches :

In the two images bellow the points inside the white circle are the docs returned or fetched by some query , which means ....... , queries can be undeterministic like the fetched results are based on some specifications like the number of times a word (in the search query) is mentioned in the document .... , this can lead to some sirious mistakes , the two matrics bellow describe the types of mistakes .

When you search for something, you type in a search query in the search box. Elasticsearch looks at the query and pulls up relevant documents or hits.Then, it calculates a score for each document and ranks them by relevance.

In tomorrow's standup i will try to explain in a nutshell how one can modify queries and in order to manipulate recall and precision

BrouthenKamel commented 11 months ago

@CS-ISE-Project/back-end-team to fully study and understand how ElasticSearch works

4zz0u4k commented 11 months ago

GITHUB Repo

https://github.com/LisaHJung/Beginners-Crash-Course-to-Elastic-Stack-Series-Table-of-Contents

CS-ISE-Project / back-end

[Research] Elastic Search #1

Description

Outcome

NOTE !! : The following explanations are based on what i've seen in Elasticsearch's yt channel , the link bellow is a playlist that describes perfectly all details about how elasticsearch operates , in the mean time you can read this breif explanaition just to get a hold of how elasticsearch works

General OverView on Elasticsearch

Logstash is a data processing pipeline. Data that logstash receives(i.e. e-commerce orders & customer messages) are handled as events. These events are parsed, filtered , and transformed and are sent off to Elasticsearch where the data will be stored.

Kibana provides a web interface to the data stored in Elasticsearch. It allows users to send queries to Elasticsearch using the same REST API. These queries can provide answers to questions such as "How many users visit our site daily?" or What was the revenue for last month?"

Basic Arch :

A node is a running instance of Elasticsearch that stores data. It has a unique id and a name.

Each node belongs to a cluster, which is a collection of nodes that are connected together. When we started up a node, a cluster was formed automatically(pink box).

You can add one or many nodes in a cluster. These nodes are distributed across separate machines. A node, by default, are assigned all of the following roles: master-eligible, data, ingest, and machine learning(if available). You can configure these roles and give specific roles to certain nodes.

A shard contains a bunch of docs (the JSON documents we store) , And these shards belong to what is called an index , each index groups shards with the same content (something in commun)

the disrebution of data over shards is done precicsly in order to get the best performance and a minimal stoarage + a good response time (the query lunched on shards is in parrallel between the nodes so the more distrebuted are the documents the faster the query is)

in many cases backup shards are created to replace there equivalent shard in case the node goes off (remeber the node is a running machine ! ) , it also good to have duplicates to manage a high number of requests .

How to interact ?

The Elasticseach search engine provides a fully functionnal built in REST API ! , which means doing a CRUD operation requires only a GET/POST/PUT/DELETE request no need for complexe db queries .

a complete guide into the basics of this commands :

this blog includes all the basic stuff from creating an index to adding and deleting a document // bare in mind that i didnt talk about the setup or anything like that , i will mention such things and recources in the standup

An overview of how relevant are the searches :

When you search for something, you type in a search query in the search box. Elasticsearch looks at the query and pulls up relevant documents or hits.Then, it calculates a score for each document and ranks them by relevance.

In tomorrow's standup i will try to explain in a nutshell how one can modify queries and in order to manipulate recall and precision

GITHUB Repo