Open Iodine98 opened 5 months ago
Currently, the tokenization method for processing text is by default the RecursiveTextSplitter, this should be given as a parameter depending on the type of document uploaded.
RecursiveTextSplitter
It concerns this method in Python:
https://github.com/Iodine98/dora-back/blob/ef44bc69930edc6f91497e55163afde73ecd0590/chatdoc/doc_loader/document_loader.py#L73-L100
Currently, the tokenization method for processing text is by default the
RecursiveTextSplitter
, this should be given as a parameter depending on the type of document uploaded.