Closed bryanwhiting closed 3 years ago
Hi Bryan, thanks for the issue. That's an unusual RSS, as it just has a series of links instead of the content and structure you'd normally associate with RSS (i.e. here). I'd recommend scraping this directly. I mean, even this little snippet of code will get you the titles and urls pretty easy, which you get parse further :
library(rvest)
#> Loading required package: xml2
library(stringr)
rss <- "http://www.datatau.com/rss"
read_html(rss) %>%
html_text() %>%
str_split("]]")
#> [[1]]
#> [1] "DataTauhttp://www.datatau.com/Hacker News for Data Science5 tips for aspiring and junior data engineershttps://medium.com/analytics-and-data/5-tips-for-aspiring-and-junior-data-engineers-8b47ef154367http://www.datatau.com/item?id=30486Comments"
#> [2] ">What To Do When You Can't AB Testhttps://towardsdatascience.com/what-to-do-when-you-cant-ab-test-4e1dff692bf7http://www.datatau.com/item?id=30467Comments"
#> [3] ">PyTorch vs. TensorFlow – a detailed comparisonhttps://www.tooploox.com/blog/pytorch-vs-tensorflow-a-detailed-comparisonhttp://www.datatau.com/item?id=30464Comments"
#> [4] ">The Personal Python Data Science Toolkithttps://www.alexfranz.com/posts/personal-python-data-science-toolkit-part-1/http://www.datatau.com/item?id=30459Comments"
#> [5] ">Why NYC is a Great Place to Break into AIhttps://blog.insightdatascience.com/why-nyc-is-a-great-place-to-break-into-ai-4acc97133391http://www.datatau.com/item?id=29230Comments"
#> [6] ">Better Preference Predictions: Tunable and Explainable Recommender Systemshttps://blog.insightdatascience.com/tunable-and-explainable-recommender-systems-cd52b6287badhttp://www.datatau.com/item?id=29318Comments"
#> [7] ">A Simple Guide to Semantic Segmentationhttps://medium.com/beyondminds/a-simple-guide-to-semantic-segmentation-effcf83e7e54?source=friends_link&sk=3d1a5a32a19d611fbd81028cfd4f23fdhttp://www.datatau.com/item?id=29312Comments"
#> [8] ">All AI and Data Science News in one Placehttps://allainews.com/http://www.datatau.com/item?id=29480Comments"
#> [9] ">Intro to forecasting with FB's Prophet (python)https://www.interviewqs.com/ddi_code_snippets/prophet_intro_http://www.datatau.com/item?id=29272Comments"
#> [10] ">Complete Machine Learning Using Azure Machine Learning https://www.udemy.com/machine-learning-using-azureml/?couponCode=DATA090http://www.datatau.com/item?id=29896Comments"
#> [11] ">AutoML for Data Augmentationhttps://blog.insightdatascience.com/automl-for-data-augmentation-e87cf692c366http://www.datatau.com/item?id=29644Comments"
#> [12] ">How to Do A/B Testing: A Checklist You’ll Want to Bookmarkhttps://medium.com/@webdavidpage/how-to-run-a-b-testing-a-checklist-youll-want-to-bookmark-99c75aa9860bhttp://www.datatau.com/item?id=29543Comments"
#> [13] ">16 Text Preprocessing Techniques in Python for Twitter Sentiment Analysishttps://github.com/Deffro/text-preprocessing-techniqueshttp://www.datatau.com/item?id=29410Comments"
#> [14] ">Using Transfer Learning for NLP with Small Datahttps://blog.insightdatascience.com/using-transfer-learning-for-nlp-with-small-data-71e10baf99a6http://www.datatau.com/item?id=30313Comments"
#> [15] ">Overview of the different approaches to putting ML models in productionhttps://medium.com/analytics-and-data/overview-of-the-different-approaches-to-putting-machinelearning-ml-models-in-production-c699b34abf86http://www.datatau.com/item?id=30198Comments"
#> [16] ">Train models and run notebooks on AWS cheaper and simpler than with SageMakerhttps://medium.com/apls/how-to-train-deep-learning-models-on-aws-spot-instances-using-spotty-8d9e0543d365http://www.datatau.com/item?id=30012Comments"
#> [17] ">Using Reinforcement Learning to Design a Better Rocket Enginehttps://blog.insightdatascience.com/using-reinforcement-learning-to-design-a-better-rocket-engine-4dfd1770497ahttp://www.datatau.com/item?id=29857Comments"
#> [18] ">Amex Data Science Interview Questionshttps://medium.com/acing-ai/amex-data-science-interview-questions-a8d2634c647http://www.datatau.com/item?id=30211Comments"
#> [19] ">Square Data Science Interview Questionshttps://medium.com/acing-ai/square-data-science-interview-questions-daa67cfe96c9http://www.datatau.com/item?id=30100Comments"
#> [20] ">The Job Board for Data Scientists and Machine Learners Onlyhttps://ai-jobs.net/#s=1http://www.datatau.com/item?id=29971Comments"
#> [21] ">Is analytics a luxury only to Giants in the Financial Services?https://medium.com/@apurva_39772/zepto-ai-powered-data-analytics-tool-for-financial-services-a01dabe610c7http://www.datatau.com/item?id=29758Comments"
#> [22] ">A visual exploration of Gaussian processeshttps://distill.pub/2019/visual-exploration-gaussian-processeshttp://www.datatau.com/item?id=29746Comments"
#> [23] ">5 domains of ecommerce Data Strategyhttps://medium.com/analytics-and-data/5-domains-of-ecommerce-data-strategy-82b61356042chttp://www.datatau.com/item?id=29603Comments"
#> [24] ">Become a Pro at Pandas, Python’s data manipulation Libraryhttps://medium.com/analytics-and-data/become-a-pro-at-pandas-pythons-data-manipulation-library-264351b586b1?source=friends_link&sk=cfcd8713cbdae2e48277acf8084c5e13http://www.datatau.com/item?id=30427Comments"
#> [25] ">Three essential skills you'll need as a data scientisthttps://peterscobas.com/2019/04/29/three-essential-skills-youll-need-as-a-data-scientist/http://www.datatau.com/item?id=30426Comments"
#> [26] ">ERUPT: Expected Response Under Proposed Treatmentshttps://medium.com/building-ibotta/erupt-expected-response-under-proposed-treatments-ff7dd45c84b4http://www.datatau.com/item?id=30397Comments"
#> [27] ">Python - Hadoop interaction tutorial (PySpark, PyArrow, impyla, etc.)https://thegurus.tech/posts/2019/05/hadoop-python/http://www.datatau.com/item?id=30355Comments"
#> [28] ">Automate your Flask Deployments on AWShttps://blog.insightdatascience.com/automate-your-flask-deployments-on-aws-db4d8e2345ahttp://www.datatau.com/item?id=30337Comments"
#> [29] ">Citibank Data Science Interview Questionshttps://medium.com/acing-ai/citibank-data-science-interview-questions-1ac5c71ff29http://www.datatau.com/item?id=30310Comments"
#> [30] ">How I became a data scientisthttps://www.peterscobas.com/2019/04/26/how-i-became-a-data-scientist/http://www.datatau.com/item?id=30297Comments"
#> [31] ">"
Created on 2020-11-14 by the reprex package (v0.3.0)
Hope that works for you, as this feed is too far removed from 'normal' RSS for me to fit it into the package.
Rob
Thanks for the reply! I appreciate the pointer and was able to finish the rest
datatau <- read_html(rss) %>%
html_text() %>%
str_split("]]") %>%
.[[1]] %>%
str_match(., ">(.*)(https://.*)(http[s]?://.*)") %>%
as.data.frame() %>%
select(V2, V3) %>%
rename(item_title=V2, item_link=V3) %>%
drop_na() %>%
as_tibble()
returns
# A tibble: 29 x 2
item_title item_link
<chr> <chr>
1 "5 tips for aspiring and junior data engineer… https://medium.com/analytics-and-data/5-tips-for-aspiring-an…
2 "What To Do When You Can't AB Test" https://towardsdatascience.com/what-to-do-when-you-cant-ab-t…
3 "PyTorch vs. TensorFlow – a detailed comparis… https://www.tooploox.com/blog/pytorch-vs-tensorflow-a-detail…
4 "The Personal Python Data Science Toolkit" https://www.alexfranz.com/posts/personal-python-data-science…
5 "Why NYC is a Great Place to Break into AI" https://blog.insightdatascience.com/why-nyc-is-a-great-place…
6 "Better Preference Predictions: Tunable and E… https://blog.insightdatascience.com/tunable-and-explainable-…
7 "A Simple Guide to Semantic Segmentation" https://medium.com/beyondminds/a-simple-guide-to-semantic-se…
8 "All AI and Data Science News in one Place" https://allainews.com/
9 "Intro to forecasting with FB's Prophet (pyth… https://www.interviewqs.com/ddi_code_snippets/prophet_intro_
10 "Complete Machine Learning Using Azure Machin… https://www.udemy.com/machine-learning-using-azureml/?coupon…
# … with 19 more rows
Great :-)
Thanks for the awesome package, Robert!! I'm loving it.
I tried this feed:
http://www.datatau.com/rss
Got this error:
here's my session info: