datawookie / feedeR

Handle RSS and Atom feeds from R
29 stars 6 forks source link

Tag mismatches from Glassdoor feed #12

Closed joelmlevin closed 6 years ago

joelmlevin commented 6 years ago

Using code:

`devtools::install_github("DataWookie/feedeR") library(feedeR)

philip_morris <- feed.extract("https://www.glassdoor.com/rss/reviews.rss?id=7745") `

I get output:

`> philip_morris <- feed.extract("https://www.glassdoor.com/rss/reviews.rss?id=7745")

Opening and ending tag mismatch: img line 11 and div Opening and ending tag mismatch: img line 21 and p Opening and ending tag mismatch: p line 17 and div Opening and ending tag mismatch: img line 34 and p Opening and ending tag mismatch: p line 30 and div Opening and ending tag mismatch: img line 48 and p Opening and ending tag mismatch: p line 43 and div Opening and ending tag mismatch: img line 61 and p Opening and ending tag mismatch: p line 57 and div Specification mandates value for attribute async attributes construct error Couldn't find end of Start Tag script line 71 Opening and ending tag mismatch: form line 70 and script Opening and ending tag mismatch: p line 69 and form Opening and ending tag mismatch: div line 68 and p Opening and ending tag mismatch: div line 28 and body Opening and ending tag mismatch: div line 15 and html Premature end of data in tag div line 14 Premature end of data in tag div line 9 Premature end of data in tag body line 8 Premature end of data in tag html line 2 Error: 1: Opening and ending tag mismatch: img line 11 and div 2: Opening and ending tag mismatch: img line 21 and p 3: Opening and ending tag mismatch: p line 17 and div 4: Opening and ending tag mismatch: img line 34 and p 5: Opening and ending tag mismatch: p line 30 and div 6: Opening and ending tag mismatch: img line 48 and p 7: Opening and ending tag mismatch: p line 43 and div 8: Opening and ending tag mismatch: img line 61 and p 9: Opening and ending tag mismatch: p line 57 and div 10: Specification mandates value for attribute async 11: attributes construct error 12: Couldn't find end of Start Tag script line 71 13: Opening and ending tag mismatch: form line 70 and script 14: Opening and ending tag mismatch: p line 69 and form 15: Opening and ending tag mismatch: div line 68 and p 16: Opening and ending tag mismatch: div line 28 and body 17: Opening and ending tag mismatch: div line 15 and html 18: Premature end of data in tag div line 14 19: Premature end of data in tag div `

The feed loads appropriately in NetNewsWire. I'm not that savvy, but happy to provide any other information to help debug. Thanks!

datawookie commented 6 years ago

That site does not return XML for the default User-Agent string being used by RCurl. Added a specific User-Agent and it works for me.