CDATA are skipped - Githubissues

Arzaroth / python_rapidxml

python bindings for RapidXml, a C++ XML parsing library

MIT License

6 stars 2 forks source link

CDATA are skipped #1

Closed Amedeo91 closed 7 years ago

Amedeo91 commented 7 years ago

Hi all,

the CDATA are always skipped when trying to parse the xml.

This is because in the document_object.cpp we have this line of code: (self->base.base.document ->parse<rapidxml::parse_no_utf8 | rapidxml::parse_no_data_nodes>) (self->base.base.document->allocate_string(text));

Shouldn't be possible to call parse or parse

Regards, Amedeo

Arzaroth commented 7 years ago

Could you provide an example with expected results ?

Amedeo91 commented 7 years ago

I want to be able to read what I have inside this CDATA below

<![CDATA[{"Cart":{"expirationTime":"2017-04-22T09:40","id":"b469df3b-f626-4fe3-898c-825373e546a2","products":["1223"],"creationTime":"2017-04-21T09:40","totalPrice":{"currencyCode":"EUR","amount":"138.000"}}}]]>

Arzaroth commented 7 years ago

Unless you have very specific needs (i.e. parsing almost-valid xml with braces in attributes), I would use BeautifulSoup4. While it would be nice to fix this using a **kw on the parse method, I do not intend to do it anytime soon. If your PR #2 passes CI (given you provide tests for the new functionality), I'll be happy to merge it.

Amedeo91 commented 7 years ago

@Arzaroth: done

Arzaroth commented 7 years ago

Resolved with PR #4