RobertMyles / tidyRSS

An R package for extracting 'tidy' data frames from RSS, Atom and JSON feeds
https://robertmyles.github.io/tidyRSS/
Other
82 stars 20 forks source link

RSS feed error for Vox.com #38

Closed polymathematic closed 4 years ago

polymathematic commented 4 years ago

Querying "https://www.vox.com/rss/index.xml" returns the following error:

Error in UseMethod("xml_find_first") : 
  no applicable method for 'xml_find_first' applied to an object of class "xml_missing"

Wonderful package, though. Works brilliantly otherwise.

RobertMyles commented 4 years ago

Thank you @polymathematic for the issue 😊, I could never find all these edge cases otherwise.

Interesting, I have a check on the content type that comes back from the GET() call; it should contain "atom". This one doesn't though, it contains only "xml" and so tidyRSS parses it as an RSS feed and fails. I'm adding an xmlns check to tidyRSS, it will help with cases like these. For now, it's in a branch for version 2.0.1, so you can use:

remotes::install_github("robertmyles/tidyRSS@v2.0.1")

And it should work fine:

tidyRSS::tidyfeed("https://www.vox.com/rss/index.xml")
#> GET request successful. Parsing...
#> # A tibble: 10 x 13
#>    feed_title feed_url feed_last_updated   feed_author feed_link feed_icon
#>    <chr>      <chr>    <dttm>              <chr>       <chr>     <chr>    
#>  1 Vox -  All https:/… 2020-02-29 20:55:55 Andrew Pro… https://… https://…
#>  2 Vox -  All https:/… 2020-02-29 20:55:55 Andrew Pro… https://… https://…
#>  3 Vox -  All https:/… 2020-02-29 20:55:55 Andrew Pro… https://… https://…
#>  4 Vox -  All https:/… 2020-02-29 20:55:55 Andrew Pro… https://… https://…
#>  5 Vox -  All https:/… 2020-02-29 20:55:55 Andrew Pro… https://… https://…
#>  6 Vox -  All https:/… 2020-02-29 20:55:55 Andrew Pro… https://… https://…
#>  7 Vox -  All https:/… 2020-02-29 20:55:55 Andrew Pro… https://… https://…
#>  8 Vox -  All https:/… 2020-02-29 20:55:55 Andrew Pro… https://… https://…
#>  9 Vox -  All https:/… 2020-02-29 20:55:55 Andrew Pro… https://… https://…
#> 10 Vox -  All https:/… 2020-02-29 20:55:55 Andrew Pro… https://… https://…
#> # … with 7 more variables: entry_title <chr>, entry_url <chr>,
#> #   entry_last_updated <chr>, entry_author <chr>, entry_content <chr>,
#> #   entry_link <chr>, entry_published <dttm>

Created on 2020-03-01 by the reprex package (v0.3.0)