deepset-ai / haystack

AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
17.72k stars 1.92k forks source link

feat: enhance documentation for DocumentCleaner #8163

Closed Amnah199 closed 3 months ago

Amnah199 commented 3 months ago

Is your feature request related to a problem? Please describe. This PR #8103 adds two new parameters to DocumentCleaner. Hence, the documentation needs to be updated.

Describe the solution you'd like Explanation of new params and order of execution in docs.

Amnah199 commented 3 months ago

@dfokina it would be nice if we can collaborate on this. Wdyt?

dfokina commented 3 months ago

Sure @Amnah199 , would you be able to add the suggestions to the documentation? From what I can see we only need to add two additional parameters? Julian also mentioned something about order execution, but I don't know any specifics..

Amnah199 commented 3 months ago

@dfokina as per my understanding the unicode_normalization and ascii_only should occur before executing other cleaning steps. So I made the suggestions based on this. Let me know what you think and we can close this issue.

dfokina commented 3 months ago

Thanks @Amnah199!