Norconex / committer-azuresearch

Implementation of Norconex Committer for Microsoft Azure Search.
https://opensource.norconex.com/committers/azuresearch/
Apache License 2.0
1 stars 2 forks source link

Is there a way to chunk the acquired content? #6

Open ki-suzuki opened 1 year ago

ki-suzuki commented 1 year ago

Is there a method to chunk the acquired content when committing it to Azure Search? If so, I would like to learn about it.

ohtwadi commented 1 year ago

If you are talking about batching multiple documents, then yes. In fact, this is done by default. Please take a look at the documentation.

<queue
      class="com.norconex.committer.core3.batch.queue.impl.FSQueue">
    <batchSize>
      (Optional number of documents queued after which we process a batch.
       Default is 20.)
    </batchSize>
...
ki-suzuki commented 1 year ago

@ohtwadi Thank you for your prompt response.

I believe what I am looking for is not this method.

Here are the details. For instance, I mean after retrieving the body of the HTML, and if the content of the body is so large that I want to divide it into several smaller chunks and submit them to the Azure Cognitive Search Index as separate records.

Is there some methods to achieve this?

Thank you.

ohtwadi commented 1 year ago

Does the DOMSplitter fit the bill for you?