Parametrize tokenization method

Iodine98 / dora-back

A Python backend for Document Retrieval and Analysis (DoRA).

MIT License

0 stars 1 forks source link

Parametrize tokenization method #59

Open Iodine98 opened 5 months ago

Iodine98 commented 5 months ago

Currently, the tokenization method for processing text is by default the RecursiveTextSplitter, this should be given as a parameter depending on the type of document uploaded.

Iodine98 commented 5 months ago

It concerns this method in Python:

https://github.com/Iodine98/dora-back/blob/ef44bc69930edc6f91497e55163afde73ecd0590/chatdoc/doc_loader/document_loader.py#L73-L100