RobertMyles / tidyRSS

An R package for extracting 'tidy' data frames from RSS, Atom and JSON feeds
https://robertmyles.github.io/tidyRSS/
Other
82 stars 20 forks source link

Retrieving duration or other custom-named tags #65

Closed stees closed 1 year ago

stees commented 1 year ago

I'm using tidyRSS to parse podcast RSS and would like to get the duration of each episode together with the title, ID, etc.

Is there a way to do that?

RobertMyles commented 1 year ago

Hi stees, there should be, if that field exists. Do you know if podcast rss feeds regularly include this info? Would you have an example or a schema?

stees commented 1 year ago

This one has that information in two forms: <itunes:duration> and also like this: <enclosure url="https://nerdcast.jovemnerd.com.br/vai_te_catar_26.mp3" length="2363" type="audio/mpeg"/>.

However, most feeds that I checked had only <itunes:duration> like this one and this one, which already suffices for me.

RobertMyles commented 1 year ago

Hi @stees , I've put this in the dev version of the package:

# remotes::install_github("RobertMyles/tidyRSS")
dplyr::glimpse(tidyRSS::tidyfeed("https://jovemnerd.com.br/feed-nerdcast/"))
#> GET request successful. Parsing...
#> Rows: 1,390
#> Columns: 14
#> $ feed_title           <chr> "NerdCast", "NerdCast", "NerdCast", "NerdCast", "…
#> $ feed_link            <chr> "https://jovemnerd.com.br", "https://jovemnerd.co…
#> $ feed_description     <chr> "O mundo vira piada no Jovem Nerd", "O mundo vira…
#> $ feed_language        <chr> "pt-br", "pt-br", "pt-br", "pt-br", "pt-br", "pt-…
#> $ feed_managing_editor <chr> "feedback@jovemnerd.com.br (Jovem Nerd)", "feedba…
#> $ feed_pub_date        <dttm> 2022-12-26 05:00:08, 2022-12-26 05:00:08, 2022-1…
#> $ feed_last_build_date <dttm> 2022-12-27 21:43:25, 2022-12-27 21:43:25, 2022-1…
#> $ item_title           <chr> "Lá do Bunker 76 - Destaques e desdestaques", "Ca…
#> $ item_link            <chr> "https://jovemnerd.com.br/nerdcast/la-do-bunker/d…
#> $ item_description     <chr> "Então é Natal, e o que você fez? Nós do Lá do Bu…
#> $ item_pub_date        <dttm> 2022-12-26 05:00:08, 2022-12-24 11:26:37, 2022-1…
#> $ item_guid            <chr> "https://jovemnerd.com.br/nerdcast/la-do-bunker/d…
#> $ item_category        <list> [], [], [], [], [], [], [], [], [], [], [], [], …
#> $ item_duration        <chr> "00:49:25", "01:49:59", "01:54:34", "00:00:10", "…

Created on 2022-12-28 with reprex v2.0.2

Let me know if that works ok for you.

stees commented 1 year ago

Works great, many thanks!!