Strumenta / SmartReader

SmartReader is a library to extract the main content of a web page, based on a port of the Readability library by Mozilla
https://smartreader.inre.me
Apache License 2.0
160 stars 36 forks source link

Automatic language detection feature #63

Closed 6MicheleStingo9 closed 3 months ago

6MicheleStingo9 commented 3 months ago

Added LanguageDetection library, DetectLanguage method, and updated tests

Details:

lofcz commented 3 months ago

To me, this doesn't seem like something that should be included in the base library. It introduces a dependency on a random sass service for a little value, given all this does is a call to their API with metadata.Title ?? Content of an article. Have you considered creating your Contrib module or using models with open weights to achieve the same without depending on a sass service? For example, https://huggingface.co/Mike0307/multilingual-e5-language-detection supports 45 languages with a precision sufficient for many use cases.

PS: whenever awaiting calls in a library code, it's a good practice to use .ConfigureAwait(false) to prevent queuing the task on the original context, the pattern is used in the existing library's code.