dennissergeev / atmosscibot

Twitter bot that generates word clouds of new open access publications in atmospheric sciences.
https://twitter.com/AtmosSciBot
MIT License
10 stars 0 forks source link

Can you guys please add #NSREP? #11

Open agmunozs opened 4 years ago

agmunozs commented 4 years ago

Thanks for the bot! Can you add an rss entry for Nature Scientific Report? Should I do that and send it to you? Example article of interest: https://www.nature.com/articles/s41598-020-69625-4

dennissergeev commented 4 years ago

Thanks for the suggestion, @agmunozs !

I see NSREP is an open-access journal, which is good. However, it's a journal not only for atmospheric science. Since this is AtmosSciBot, I would like to stay away from general purpose journals. If you can find a way to filter atmospheric-science publications from the RSS feed, for example, we can add NSREP.

If you still feel keen to add NSREP, you're welcome to submit a pull request. The bot doesn't inherently know how to parse HTML pages (every journal uses different HTML tags to store text), so a rule for NSREP needs to be added to the parse_article.py module. Namely, what HTML elements contain the text of the article. For example, for Wiley journals, the bot finds elements defined by this dictionary. It would be great if you could inspect the web page with the full text of your paper and add the rule accordingly.

On the other hand, if you want to make a word cloud of your article as a one-off thing, you can just use the word_cloud Python package yourself.