Closed tombroekel closed 7 months ago
Hi Tom, thank you but this does seem too specific. I'm open to improving type_check()
but checking the content-type seems pretty standard. I'll close for now but if you have a general solution, I'm definitely open to exploring that.
Hi,
currently, in the type_check function the assessment is entirely based on response$headers$
content-type
. However, for some feeds this seems misleading, e.g.: https://www.tagesschau.de/infoservices/alle-meldungen-100~atom.xml . This case will be wrongly classified as RSS. A work around is to also consider the information contained in the URL (see below), but it is a rather specific solution. Therefore, I didn't add it directly, maybe you got a better idea.content_type <- response$headers$
content-type
url_type <- response$url typ <- case_when(grepl(x = url_type, pattern = "atom") ~ "atom", grepl(x = content_type, pattern = "xml") ~ "rss", grepl(x = content_type, pattern = "html") ~ "rss", grepl(x = content_type, pattern = "atom") ~ "atom", grepl(x = content_type, pattern = "rss") ~ "rss", grepl(x = content_type, pattern = "json") ~ "json", TRUE ~ "unknown")