RobertMyles / tidyRSS

An R package for extracting 'tidy' data frames from RSS, Atom and JSON feeds
https://robertmyles.github.io/tidyRSS/
Other
82 stars 20 forks source link

Tidyfeed() RSS Import Error #60

Closed mjl21 closed 2 years ago

mjl21 commented 2 years ago

Hi - I'm trying to extract data from a RSS page using tidyfeed(), however, I'm running into what appears to be a HTTP status error message. I tried the examples in the R documentation, which worked as expected.

Below are the simple commands I'm trying to run and the error message I keep receiving: image

image

I saw a video posted on YouTube prior to the 2.0.5 release that worked, so I'm hoping you might be able to help figure out why this keeps failing.

Thanks for the help!

RobertMyles commented 2 years ago

Hi @mjl21 ,

This works for me:

tidyRSS::tidyfeed("https://sec.report/Form/13F-HR.rss")
#> GET request successful. Parsing...
#> # A tibble: 300 × 16
#>    feed_title      feed_link     feed_description feed_language feed_managing_e…
#>    <chr>           <chr>         <chr>            <chr>         <chr>           
#>  1 SEC Form 13F-HR https://sec.… SEC Filings und… en-us         admin@sec.repor…
#>  2 SEC Form 13F-HR https://sec.… SEC Filings und… en-us         admin@sec.repor…
#>  3 SEC Form 13F-HR https://sec.… SEC Filings und… en-us         admin@sec.repor…
#>  4 SEC Form 13F-HR https://sec.… SEC Filings und… en-us         admin@sec.repor…
#>  5 SEC Form 13F-HR https://sec.… SEC Filings und… en-us         admin@sec.repor…
#>  6 SEC Form 13F-HR https://sec.… SEC Filings und… en-us         admin@sec.repor…
#>  7 SEC Form 13F-HR https://sec.… SEC Filings und… en-us         admin@sec.repor…
#>  8 SEC Form 13F-HR https://sec.… SEC Filings und… en-us         admin@sec.repor…
#>  9 SEC Form 13F-HR https://sec.… SEC Filings und… en-us         admin@sec.repor…
#> 10 SEC Form 13F-HR https://sec.… SEC Filings und… en-us         admin@sec.repor…
#> # … with 290 more rows, and 11 more variables: feed_web_master <chr>,
#> #   feed_pub_date <dttm>, feed_last_build_date <dttm>, feed_generator <chr>,
#> #   feed_ttl <chr>, item_title <chr>, item_link <chr>, item_description <chr>,
#> #   item_pub_date <dttm>, item_guid <chr>, item_category <list>

Created on 2022-06-02 by the reprex package (v2.0.1)

sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.1.0 (2021-05-18)
#>  os       macOS High Sierra 10.13.6
#>  system   x86_64, darwin17.0
#>  ui       X11
#>  language (EN)
#>  collate  en_IE.UTF-8
#>  ctype    en_IE.UTF-8
#>  tz       Europe/Dublin
#>  date     2022-06-02
#>  pandoc   2.10.1 @ /usr/local/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  cli           3.3.0   2022-04-25 [1] CRAN (R 4.1.2)
#>  crayon        1.5.1   2022-03-26 [1] CRAN (R 4.1.2)
#>  digest        0.6.29  2021-12-01 [1] CRAN (R 4.1.0)
#>  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.1.0)
#>  evaluate      0.15    2022-02-18 [1] CRAN (R 4.1.2)
#>  fansi         1.0.3   2022-03-24 [1] CRAN (R 4.1.2)
#>  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.1.0)
#>  fs            1.5.2   2021-12-08 [1] CRAN (R 4.1.0)
#>  glue          1.6.2   2022-02-24 [1] CRAN (R 4.1.2)
#>  highr         0.9     2021-04-16 [1] CRAN (R 4.1.0)
#>  htmltools     0.5.2   2021-08-25 [1] CRAN (R 4.1.0)
#>  knitr         1.39    2022-04-26 [1] CRAN (R 4.1.2)
#>  lifecycle     1.0.1   2021-09-24 [1] CRAN (R 4.1.0)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.1.2)
#>  pillar        1.7.0   2022-02-01 [1] CRAN (R 4.1.2)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.1.0)
#>  purrr         0.3.4   2020-04-17 [1] CRAN (R 4.1.0)
#>  R.cache       0.15.0  2021-04-30 [1] CRAN (R 4.1.0)
#>  R.methodsS3   1.8.1   2020-08-26 [1] CRAN (R 4.1.0)
#>  R.oo          1.24.0  2020-08-26 [1] CRAN (R 4.1.0)
#>  R.utils       2.11.0  2021-09-26 [1] CRAN (R 4.1.0)
#>  reprex        2.0.1   2021-08-05 [1] CRAN (R 4.1.0)
#>  rlang         1.0.2   2022-03-04 [1] CRAN (R 4.1.2)
#>  rmarkdown     2.14    2022-04-25 [1] CRAN (R 4.1.2)
#>  rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.1.0)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.1.0)
#>  stringi       1.7.6   2021-11-29 [1] CRAN (R 4.1.0)
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 4.1.0)
#>  styler        1.7.0   2022-03-13 [1] CRAN (R 4.1.2)
#>  tibble        3.1.7   2022-05-03 [1] CRAN (R 4.1.2)
#>  utf8          1.2.2   2021-07-24 [1] CRAN (R 4.1.0)
#>  vctrs         0.4.1   2022-04-13 [1] CRAN (R 4.1.2)
#>  withr         2.5.0   2022-03-03 [1] CRAN (R 4.1.2)
#>  xfun          0.31    2022-05-10 [1] CRAN (R 4.1.2)
#>  yaml          2.3.5   2022-02-21 [1] CRAN (R 4.1.2)
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.1/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Created on 2022-06-02 by the reprex package (v2.0.1)

mjl21 commented 2 years ago

Thanks, @RobertMyles. I was using R 4.2.0 before, so I rolled back to R 4.1.3 and got the same error message:

image

Are you using tidyRSS v2.0.5 or 2.0.4?

RobertMyles commented 2 years ago

Hi @mjl21 , this error is an internet error (i.e. tidyRSS can't access the RSS feed from your computer), it's not related to the R version or the package version (you can see from the above I'm using R 4.1, and I used tidyRSS v2.0.5 for that). Would you mind using the reprex package (https://www.tidyverse.org/help/) to try this out so we can debug this together? From the info you've given me, it's a lack of internet access.

mjl21 commented 2 years ago

Hi @RobertMyles, I appreciate you working through this issue with me! Below is the reprex() output for my short script - let me know if I copied / pasted this incorrectly as I haven't worked with the reprex package before.

install.packages("tidyRSS"); install.packages("data.table")

> Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.1'

> (as 'lib' is unspecified)

> Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.1'

> (as 'lib' is unspecified)

library("data.table"); library("tidyRSS")

sec_report <- "https://sec.report/Form/13F-HR.rss" ownership_data <- tidyfeed(sec_report)

> Error in safe_get(feed, config = ua): Attempt to get feed was unsuccessful (non-200 response). Feed may not be available.

ownership_data <- as.data.frame(ownership_data)

> Error in as.data.frame(ownership_data): object 'ownership_data' not found

Establishing the "sec_report" variable works, but things break down in the next line when I try to run ownership_data <- tidyfeed(sec_report) as you mentioned.

RobertMyles commented 2 years ago

So that code works perfectly for me. Sincerely, this looks like an internet issue and nothing to do with tidyRSS. Can you try this? Both feeds should return a 200 status code.

library(tidyRSS)
#> Warning: package 'tidyRSS' was built under R version 4.1.2
sec_report <- "https://sec.report/Form/13F-HR.rss"
httr::GET(sec_report)$status
#> [1] 200
httr::GET("https://xkcd.com/rss.xml")$status
#> [1] 200

Created on 2022-06-07 by the reprex package (v2.0.1)

You should be able to see https://xkcd.com/rss.xml and https://sec.report/Form/13F-HR.rss in your browser too, you could check that also.

mjl21 commented 2 years ago

Thanks, @RobertMyles. I can access both RSS feeds via the links you sent, however, I'm getting a 503 connection error for the SEC.Report site and not the xkcd site. See below:

library(tidyRSS) sec_report <- "https://sec.report/Form/13F-HR.rss" httr::GET(sec_report)$status

> [1] 503

httr::GET("https://xkcd.com/rss.xml")$status

> [1] 200

I'm researching why I'm experiencing this issue and not you or other people, but would welcome any advice on how to get past this.

Thanks for all your help!

RobertMyles commented 2 years ago

ok, interesting!! This appears when you can't get to the site, so something may be restricting your access, i.e. being on a work laptop or something like that.

Anyway, happy to see it's not package related, best of luck finding the cause.