falcong / pugixml

Automatically exported from code.google.com/p/pugixml
0 stars 0 forks source link

DOCTYPE entity expansion #103

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I really need support for DOCTYPE entity expansion. We use XML for our 
application configuration files. We have found it very useful to put common 
library / resource configuration into shared configuration files that we then 
entity include into our main application configurtion files. This eliminates a 
lot of clutter from the primary application configuration file. It also has the 
great benefit that you can modify common resource configuration in just one 
place and not have to update numerous application configuration files. Entity 
variable substituion has also proven useful for striped application 
configuration files that are essentially identical except for a few variable 
attributes that are easily managed via some entities at the top of the 
configuration file. Simply put, it makes our application configuration files 
much easier to read and maintain.

We currently use libxml2 and this is the only feature that's preventing me from 
migrating to pugixml. I'd be more than happy to help test this feature.

Thanks, Ryan

Original issue reported on code.google.com by Ryan.Lee...@gmail.com on 20 Apr 2011 at 2:23

GoogleCodeExporter commented 9 years ago
Note, I think think this request would also satisfy the requirements of Issue 
85 as well.

Original comment by Ryan.Lee...@gmail.com on 20 Apr 2011 at 2:25

GoogleCodeExporter commented 9 years ago
Unfortunately, this is impossible. The parser can only do entity replacements 
if the replacement text is not longer than the replaced one. All character 
references in all encodings satisfy this requirement, but entities don't.

The only way is to handle it like MSXML/System.Xml does it - recognize entities 
as separate nodes. This changes the document structure and the access is no 
longer transparent - i.e. instead of using node.value() or node.child_value(), 
you'll have to concatenate the children child values yourself - this is what 
things like node.InnerText do in System.Xml.

Also this would require complete DOCTYPE parsing, which is slightly tedious 
(although doable).

Original comment by arseny.k...@gmail.com on 20 Apr 2011 at 3:20

GoogleCodeExporter commented 9 years ago
I wonder if it's possible to handle DOCTYPE expansion as a pre-processing step? 
In other words, only do it if requested via some additional parse option. When 
specified, do some pre-processing step to expand everything and then pass the 
inflated memory buffer to the current parsing logic? I realize it's probably a 
somewhat expensive operation, but you wouldn't have to pay a performance 
penalty unless you want the feature. I'm not an XML expert, so maybe it's not 
possible to handle DOCTYPE entities this way, but it seems like it might work.

Original comment by Ryan.Lee...@gmail.com on 20 Apr 2011 at 9:01

GoogleCodeExporter commented 9 years ago
Incidentally, I was able to create a separate preprocessor that expands the DTD 
and then passes the results to the load methods of PUGI xml. The preprocessing 
step also expands environment variables and include processing instructions as 
described in your examples. Actually I extended the include processing 
instruction to add optional support for X-Path queries so that you can include 
document subsets. I also added support for a similar inheritance processing 
instruction which provides a powerful configuration inheritance mechanism. At 
any rate, I just wanted to say that I was able to get all the features I wanted 
by creating my own XML preprocessor and then using PUGI to handle the rest. 
Thanks for the great library.

Ryan

Original comment by catherin...@gmail.com on 11 Sep 2011 at 10:26

GoogleCodeExporter commented 9 years ago
Yes, the possibility of an external preprocessor did not occur to me at the 
time; I'm glad you were able to solve the problem using this method.

I don't have plans for proper DTD support though (compliant implementation is 
too complex, considering the rather low expected frequency of use), so I'm 
closing the issue.

Original comment by arseny.k...@gmail.com on 9 Dec 2011 at 6:10