dhvcc / rss-parser

typed python RSS parsing module built using xmltodict and pydantic
https://dhvcc.github.io/rss-parser/
GNU General Public License v3.0
40 stars 4 forks source link

Parsing error with CDATA in categary tag. #50

Closed cornelinux closed 2 months ago

cornelinux commented 2 months ago

Implementing the rss-parser was a breeze and quite simple. Thank you. I am using version 2.0.0 from PyPI with Python 3.10.12.

However, when I am parsing wordpress blogs, I get an error

channel -> content -> item -> 1 -> content -> category -> content
  str type expected (type=type_error.str)

Which looks like it does not like CDATA in the category. When parsing a feed, where the category tag is a plain str, there is no problem.

dhvcc commented 2 months ago

Can you provide the sample of those feed so that when I answer I'll be sure that it works for you? Thanks for using the library, hope you like it

ranma42 commented 2 months ago

I am hitting the same problem when parsting https://www.ilfattoquotidiano.it/feed/ (older snapshots with the same kind of contents are available on the webarchive: https://web.archive.org/web/20240720105317/https://www.ilfattoquotidiano.it/feed/ )

According to https://validator.w3.org/feed/ it is a valid RSS feed, but it errors out with the same error as @cornelinux

dhvcc commented 2 months ago

Thanks, I'll take a look and give you a code sample

dhvcc commented 2 months ago

Hey, in fact, the spec specified that the category can be a multiple. I've update the code in 2.0.1 version. I'll go ahead and double check every other element for that sort of mistakes Thanks for pointing this out!

dhvcc commented 2 months ago

I'll rename the version to 2.1.0 since it does change the logic of the library slightly. Should be deployed in 5 minutes