FasterXML / aalto-xml

Ultra-high performance non-blocking XML processor (Stax API + extensions)
Apache License 2.0
287 stars 70 forks source link

Can't use ENTITY_REFERENCE event for resolution in an Attribute #64

Open otcdlink-simpleuser opened 6 years ago

otcdlink-simpleuser commented 6 years ago

I need to resolve custom XML entities in some custom event handler/function. Sadly, an unknown entity doesn't trigger the XMLEvent.ENTITY_REFERENCE event. If this feature is not implemented, is there any workaround?

Here is a test case showing what I expect from Aalto. I'm using aalto-xml 1.1.0.

package com.otcdlink.chiron.wire;

import org.junit.jupiter.api.Test;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamReader;
import javax.xml.stream.events.XMLEvent;
import java.io.StringReader;

import static org.junit.jupiter.api.Assertions.fail;

public class StaxPlayground {

  @Test
  void entityReplacement() throws XMLStreamException {
    final XMLInputFactory xmlInputFactory =
        new com.fasterxml.aalto.stax.InputFactoryImpl() ;
//        javax.xml.stream.XMLInputFactory.newInstance() ;

    xmlInputFactory.setProperty( XMLInputFactory.SUPPORT_DTD, true ) ;
    xmlInputFactory.setProperty( XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES, false ) ;
    xmlInputFactory.setProperty( XMLInputFactory.IS_SUPPORTING_EXTERNAL_ENTITIES, false ) ;
    xmlInputFactory.setProperty( XMLInputFactory.IS_COALESCING, false ) ;

    final String xml =
        "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n" +
        "<whatever x='&replace-me;' />"
//        "<whatever x='no-entity' />"
    ;

    final XMLStreamReader xmlStreamReader =
        xmlInputFactory.createXMLStreamReader( new StringReader( xml ) ) ;

    found: {
      while( xmlStreamReader.hasNext() ) {
        final int staxEvent = xmlStreamReader.next() ;
        if( staxEvent == XMLEvent.ENTITY_REFERENCE ) {
          LOGGER.info( "Got entity reference '" + xmlStreamReader.getLocalName() + "'." ) ;
          break found ;
        }
      }
      fail( "Found no entity reference." ) ;
    }
  }

// =======
// Fixture
// =======

  private static final Logger LOGGER = LoggerFactory.getLogger( StaxPlayground.class ) ;
}

All I get is an exception:

com.fasterxml.aalto.WFCException: Unexpanded ENTITY_REFERENCE (replace-me) in attribute value
 at [row,col {unknown-source}]: [2,26]

    at com.fasterxml.aalto.in.XmlScanner.reportInputProblem(XmlScanner.java:1333)
    at com.fasterxml.aalto.in.XmlScanner.reportUnexpandedEntityInAttr(XmlScanner.java:1343)
    at com.fasterxml.aalto.in.ReaderScanner.collectValue(ReaderScanner.java:901)
    at com.fasterxml.aalto.in.ReaderScanner.handleStartElement(ReaderScanner.java:794)
    at com.fasterxml.aalto.in.ReaderScanner.nextFromProlog(ReaderScanner.java:236)
    at com.fasterxml.aalto.stax.StreamReaderImpl.next(StreamReaderImpl.java:790)
    at com.otcdlink.chiron.wire.StaxPlayground.entityReplacement(StaxPlayground.java:44)
otcdlink-simpleuser commented 6 years ago

I just figured out that ENTITY_REFERENCE does work inside an Element's text. When the test case parses "<whatever>&replace-me;</whatever>" the ENTITY_REFERENCE happens.

Is there any way to hook on entity resolution when parsing an Attribute?

otcdlink-simpleuser commented 6 years ago

I'm looking at ReaderScanner's code around line 1066 and 897 and obviously the parser wants such an undefined entity to fail. Sounds like bad news for me.

otcdlink-simpleuser commented 6 years ago

OK I got it. I should ask for unresolved entities and resolve them on my own instead of relying on ENTITY_REFERENCE since it's probably not supposed to work with Attributes.

The problem is, disabling entity resolution is not yet possible. I'm opening another issue for that.