Eric-Shi / protobuf-java-format

Automatically exported from code.google.com/p/protobuf-java-format
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Tokenizer bug in XmlFormat.merge #37

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
When field like (which was generated using XMLFomat.printToString):

<email>juraj@michalak.com</email>

is parsed, tokenizer detects these 2 tokens:
'juraj@michalak' and 'com'
The error output is:
Caused by: com.googlecode.protobuf.format.XmlFormat$ParseException: 1:71: 
Expected ">".
        at com.googlecode.protobuf.format.XmlFormat$Tokenizer.parseException(XmlFormat.java:668)
        at com.googlecode.protobuf.format.XmlFormat$Tokenizer.consume(XmlFormat.java:467)
        at com.googlecode.protobuf.format.XmlFormat.consumeClosingElement(XmlFormat.java:791)
        at com.googlecode.protobuf.format.XmlFormat.mergeField(XmlFormat.java:875)
        at com.googlecode.protobuf.format.XmlFormat.handleObject(XmlFormat.java:993)
        at com.googlecode.protobuf.format.XmlFormat.handleValue(XmlFormat.java:886)
        at com.googlecode.protobuf.format.XmlFormat.mergeField(XmlFormat.java:866)
        at com.googlecode.protobuf.format.XmlFormat.merge(XmlFormat.java:774)
        at com.googlecode.protobuf.format.XmlFormat.merge(XmlFormat.java:722)
        at com.protobufexample.ListPeople.main(ListPeople.java:64)

Correct would be only one token:
'juraj@michalak.com'

protobuf-java-format 1.2

Original issue reported on code.google.com by Juraj.Mi...@gmail.com on 23 Nov 2011 at 10:39

GoogleCodeExporter commented 8 years ago
It seems the problem is inside the region matching code in nextToken() of 
XmlFormat.java. The regex in 'TOKEN' is picking up the '.' of the string as a 
boundary when it shouldn't be. I'm trying to understand what in the regex are 
explicit boundaries, but am not quite seeing it.

Original comment by jkwin...@gmail.com on 7 May 2015 at 11:36