aditeyabaral / newsnow

Automated document merging and extractive summarization of news articles
4 stars 5 forks source link

Enable the application to work for Hindi News #7

Open AindriyaBarua opened 2 years ago

AindriyaBarua commented 2 years ago

Hello @aditeyabaral I would like to add a feature enhancement to this application, making it able to also search for queries in Hindi (Devanagari script), and output summaries from Hindi news articles. If you are interested in that feature, I will start working on it. Do let me know :)

aditeyabaral commented 2 years ago

Yup sounds interesting, go for it :) Is it possible to extend the functionality for any given script language?

AindriyaBarua commented 2 years ago

It is, as long as Googlenews supports that language, but we would have to have dataset/pre-trained word embedding model of that language.

AindriyaBarua commented 2 years ago

I cannot seem to be able to find the languages supported by googlenews python library, could you please try looking it up? However, I did find some pre-trained word embedding models we could use.

aditeyabaral commented 2 years ago

After checking the library's code, it looks like it uses all Google supported languages since it essentially uses requests to fetch data. All the languages can be found in this link. As for the pretrained models, if a model for a language exists, load that model. Else default to Tfidf based vectorization techniques.

AindriyaBarua commented 2 years ago

As of now, I am working on enabling Hindi, I will keep it extensible for any language, and extend it later. I am using pretrained Hindi Fastext model, as it is better for Indian languages, as established here

aditeyabaral commented 2 years ago

Alright sure go ahead!