Closed chuanqisun closed 3 years ago
That feed is not valid https://validator.w3.org/feed/check.cgi?url=https%3A%2F%2Falistapart.com%2Fmain%2Ffeed%2F This is a sad but common problem when parsing feeds. Feedparser doesn't have an opinion about how you should handle invalid feeds -- everyone kind of needs to figure that out for themself given the goals of the project they're working on.
I wonder if there is an easy way to just get the plaintext within the Author field by Preston So
For this specific workaround, the #
property contains the plain text parts of the original feed item. So, you would need to recursively parse the rss:author
property to pull out the #
properties, then join them together with a space.
Before submitting your issue, please make sure these boxes are checked. Thank you!
[ ] Review the compressed example. I tried but the URL is broken.
FeedParser@2.2.10
Node@14.16.1
Problem feed: https://alistapart.com/main/feed/
Problem feed meta:
In the feed item, the author field contains HTML:
The parser strips the entire
<a>
tag from theauthor
property in the outputThe
rss:author
property has some additional information but I think it's difficult write generalized extract logic as the structure can differ from feed to feedI wonder if there is an easy way to just get the plaintext within the Author field
by Preston So
.Thanks!