Open MartinP7r opened 4 years ago
Some information about entity tags: https://www.logicbig.com/tutorials/misc/xml/xml-entity.html#:~:text=Internal%20Entities%3A%20An%20internal%20entity,defined%20in%20an%20separate%20file.
maybe relevant apple documentation:
https://developer.apple.com/documentation/foundation/nsxmlparserdelegate/1412907-parser
parser:foundUnparsedEntityDeclarationWithName:publicID:systemID:notationName:
https://developer.apple.com/documentation/foundation/nsxmlparserdelegate/1414803-parser
parser:foundInternalEntityDeclarationWithName:value:
This seems like it would be necessary to decode the entity shortcuts.
relevant in case of external entity declaration (entity declaration resides in other file in other file) https://developer.apple.com/documentation/foundation/nsxmlparserdelegate/1416221-parser
I forked your project and wrote a test case that fails as expected:
final class EntityTests: XCTestCase {
let xml = """
<!DOCTYPE note [
<!ENTITY jd "John Doe">
]>
<note>
<author>&jd;</author>
</note>
"""
struct Note: Decodable {
let author: String
}
func testEntityIsExpanded() throws {
let decoded = try XMLDecoder().decode(Note.self,
from: xml.data(using: .utf8)!)
XCTAssertEqual(decoded.author, "John Doe")
}
}
XCTAssertEqual failed: ("") is not equal to ("John Doe")
Consider that it says ("") is not equal to ("John Doe")
not ("&jd;") is not equal to ("John Doe")
I've only just started looking into your implementation, but if you'd be interested I'd try add the feature for decoding (internal) entity definitions and make a pull request. If it's actually implementable with XMLParserDelegate
...
edit: actually doesn't look too good: https://stackoverflow.com/questions/44680734/parsing-xml-with-entities-in-swift-with-xmlparser
another (10 year old) stackoverflow comment and 5 year old radar (also https://www.mail-archive.com/cocoa-dev@lists.apple.com/msg67796.html) states that NSXMLParser
doesn't pick up on entities other than the standard ones and will just remove them or through an error if they are not defined.
This seems to be the case.
As for my specific case, I will probably try and see if replacing them one by one before running the parser is somewhat efficient.
Another possible solution would be to save all ENTITY
definitions that get picked up by parser(parser: XMLParser, parseErrorOccurred parseError: NSError)
and then replace then check and replace them within parser(_ parser: XMLParser, foundCharacters string: String)
edit: I just tried a strategy to replace within parser(_ parser: XMLParser, foundCharacters string: String)
and sadly the &...;
term doesn't even make it there and seems to get replace beforehand.
The xml file I'm working with contains a lot of
<!ENTITY...>
style abbreviations inside the DOCTYPE tag that don't seem to get picked up. Is there any configuration I have to do in order to make it work?the tags look like this:
and one example:
the entry->sense->pos tag should expand
&unc;
intounclassified
because of<!ENTITY unc "unclassified">
, but in the resulting struct it comes up empty.The underlying Parser is picking up on the entities, because it will through an
Error Domain=NSXMLParserErrorDomain Code=111 "(null)"
if even one of the entities used in the xml files is missing from the definitions in the DOCTYPE header.