Azure-Samples / azure-openai-rag-workshop

Create your own ChatGPT with Retrieval-Augmented-Generation workshop
https://aka.ms/ws/openai-rag
MIT License
93 stars 271 forks source link

Shall we rename the Indexer component ? #21

Closed agoncal closed 7 months ago

agoncal commented 7 months ago

At first I didn't understand what "Indexer" meant in the document processor. And the documentation mentions "Data ingestion". So what about renaming the component "Indexer" to something else:

sinedied commented 7 months ago

It's been in the todo to rename it to ingestion, but I left it as it is last time as I did not get the chance to refactor everything.

I would prefer to keep it 1-word for simplicity, what about ingestion then?

SandraAhlgrimm commented 7 months ago

Well, I believe that ingestion itself is not very self explaining. But I also can't come up with a 1-word solution. Why does it have to be just one?

sinedied commented 7 months ago

It will be used to name the service in bicep and yaml too, and I think using more than 1 word might be confusing, ie data-ingestion-java-quarkus.

What bother me is that we called the process "data ingestion" here, but what we're doing technically is indexing data into the DB (that's how it's called too in Azure docs). Maybe the good solution is to keep the indexer name and change how we named the process in the docs?

agoncal commented 7 months ago

Cultural difference between JS and Java developers. We love our long class names ;o) ObjectFactoryCreatingFactoryBean

agoncal commented 7 months ago

I quite like the 1-word version: ingestion is ok for me

SandraAhlgrimm commented 7 months ago

It will be used to name the service in bicep and yaml too, and I think using more than 1 word might be confusing, ie data-ingestion-java-quarkus.

What bother me is that we called the process "data ingestion" here, but what we're doing technically is indexing data into the DB (that's how it's called too in Azure docs). Maybe the good solution is to keep the indexer name and change how we named the process in the docs?

I agree here, the way I understood the typescript code is that you're doing indexing here.

Copilot answer: Data ingestion and indexing are both crucial steps in managing and utilizing data, but they serve different purposes:

  1. Data Ingestion:

    • Definition: Data ingestion is the process of moving and replicating data from various sources to a target landing or raw zone. This destination could be a cloud data lake, a cloud data warehouse, or another storage medium where the data can be accessed, used, and analyzed by an organization.
    • Purpose: Data ingestion ensures that data from diverse sources (such as databases, APIs, logs, files, sensors, etc.) is collected and made available for further processing. It's the first step in the data pipeline.
    • Key Activities:
      • Extraction: Retrieving data from source systems.
      • Transformation: Converting data into a suitable format.
      • Loading: Storing data in the target location.
    • Example: Collecting customer orders from an e-commerce website and storing them in a data lake for analysis.
  2. Indexing:

    • Definition: Indexing is the process of creating a searchable structure (an index) that allows efficient retrieval of data from a large dataset.
    • Purpose: Indexing enhances data query performance by organizing data in a way that accelerates search operations.
    • Key Activities:
      • Creating Indexes: Identifying relevant fields and creating index structures.
      • Updating Indexes: Keeping indexes up-to-date as data changes.
      • Query Optimization: Utilizing indexes to speed up search queries.
    • Example: Creating an index on a database table's primary key column to quickly locate specific records.

In summary, data ingestion focuses on getting data into the system, while indexing optimizes data access and retrieval. Both processes are essential for effective data management and analysis.

agoncal commented 7 months ago

@sinedied @SandraAhlgrimm so, shall we rename indexer with ingestion then ?

+1 for me

agoncal commented 7 months ago

I've renamed indexer with ingestion in the java branch only

https://github.com/Azure-Samples/azure-openai-rag-workshop/commit/9b1bef910dcaeb8d835820132db2e94e7b099244