RobertMyles / tidyRSS

An R package for extracting 'tidy' data frames from RSS, Atom and JSON feeds
https://robertmyles.github.io/tidyRSS/
Other
82 stars 20 forks source link

tibble error with tidyfeed() function after updating R version to 4.3.2 #77

Closed pragativprasad closed 7 months ago

pragativprasad commented 8 months ago

Hello,

The following code used to work under my former R version:

tidyfeed(
    feed = "http://rss.cnn.com/rss/cnn_health.rss",
    config = list(),
    clean_tags = TRUE,
    list = FALSE,
    parse_dates = TRUE
  )

But after updating to 4.3.2, I'm getting the following error:

Error in `tibble()`:
! Tibble columns must have compatible sizes.
• Size 16: Existing data.
• Size 29: Column `item_link`.
ℹ Only values of size one are recycled.
Backtrace:
 1. tidyRSS::tidyfeed(...)
 2. tidyRSS:::rss_parse(response, list, clean_tags, parse_dates)
 3. tibble::tibble(...)
RobertMyles commented 7 months ago

Hi Pragati, I'm seeing that too. I'll try to work on this asap.

RobertMyles commented 7 months ago

Pragati, that looks like a messy feed to me (please correct me if I'm wrong).

Doing this just now, when we get to building a nibble out of the items, there are 17:

# A tibble: 17 × 1
   item_title                                                                                              
   <chr>                                                                                                   
 1 "RSV hospitalization rate for seniors is 10 times higher than usual for this point in the season"       
 2 "Covid-19 boosters could keep thousands of kids out of hospitals, but uptake remains low"               
 3 "Experimental therapy gantenerumab fails to slow or improve Alzheimer's memory loss in clinical trials" 
 4 "US gets D+ grade for rising preterm birth rates, new report finds "                                    
 5 "Desperate for heart surgery for their baby, a family feels the effects of pediatric hospital shortages"
 6 "Fuzzy first photo of a black hole gets a sharp makeover"                                               
 7 "Paper airplane breaks a world distance record "                                                        
 8 "This bat fossil could fill in a piece of the evolutionary puzzle"                                      
 9 "How long you can use vintage Tupperware"                                                               
10 "Why we have nightmares and how to stop them"                                                           
11 "Your guide to finding the right Theragun massage gun for you "                                         
12 "8 activewear brands you should add to your workout wardrobe"                                           
13 "The best coupons at CVS Pharmacy"                                                                      
14 "'Groundhog Day' movie: The Buddhist lifehacker film"                                                   
15 "Alzheimer's Disease Fast Facts"                                                                        
16 "Simp: The slang teenagers use to insult boys"                                                          
17 "Bugs, rodent hair and poop: How much is legally allowed in the food you eat every day?" 

However, every other entry is of length 29. When I look at the feed itself (http://rss.cnn.com/rss/cnn_health.rss), it looks like a mess of text, it should be something like this: https://www.rssboard.org/files/sample-rss-2.xml I'm going to close this as malformed feeds should be handled with lower-level tools such as xml2.