Broken Code in Section 5.3.1

dgrtwo / tidy-text-mining

Manuscript of the book "Tidy Text Mining with R" by Julia Silge and David Robinson

http://tidytextmining.com

Other

1.31k stars 804 forks source link

Broken Code in Section 5.3.1 #62

Open kaybenleroll opened 5 years ago

kaybenleroll commented 5 years ago

The code scraping in section 5.3.1 no longer works as most of the code in the package tm.plugin.webmining is not up-to-date.

I tried switching the GoogleFinanceSource to YahooFinanceSource but that did not work either.

I am sure there are alternatives, but I figured it is best reported here first.

juliasilge commented 5 years ago

Thank you very much for this report! 🙌 I want to acknowledge it and let you know we are aware and looking for a replacement data source to use in the book.

Just to record it here, ideally we would want to find something that:

allows us to demonstrate how to tidy() a document-term matrix
is an appropriate use case for the Loughran and McDonald sentiment lexicon

This may be too high an ask, though, and we need to break these apart and integrate these two bits of information separately. @dgrtwo

kaybenleroll commented 5 years ago

Not at all Julia, happy to help! Let me know if you need any help with this - happy to help out any way I can. That book is really useful and has helped me a lot, so happy to contribute back. :)

nattalides commented 4 years ago

Same issue - after a bit of search it looks like the service from Yahoo and Google has been deprecated so probably best remove that bit.

@dgrtwo @juliasilge Do you think it would be better/easier to have a stored Corpus/VCorpus/WebCorpus financial article dataset as part of {tidytext} removing dependencies from other packages. This will enable to demonstrate both of the bullet points you raised.

DesmondChoy commented 4 years ago

Thank you very much for this report! 🙌 I want to acknowledge it and let you know we are aware and looking for a replacement data source to use in the book.

Just to record it here, ideally we would want to find something that:

allows us to demonstrate how to tidy() a document-term matrix

is an appropriate use case for the Loughran and McDonald sentiment lexicon

This may be too high an ask, though, and we need to break these apart and integrate these two bits of information separately. @dgrtwo

How about company's earnings call transcripts? I stumbled upon a site that seems to provide these for free: https://news.alphastreet.com/ (Note: I'm not affiliated with them in any way)

smmathews commented 3 weeks ago

I've created a PR that explains the issue the reader is about to encounter. While this PR is still open and unresolved, it would probably be a good idea to acknowledge the issue in the text.