bivas / protobuf-java-format

Provide serialization and de-serialization of different formats based on Google’s protobuf Message. Enables overriding the default (byte array) output to text based formats such as XML, JSON and HTML.
BSD 3-Clause "New" or "Revised" License
152 stars 97 forks source link

Exception on Special Characteres "//" #38

Open hebergentilin opened 7 years ago

hebergentilin commented 7 years ago

I'm getting errors when informing special characteres like '//' (char generated from encoded base64 file) at a proto bytes field.

protos

message InformaContestacaoCliente {
    required Contestacao contestacao = 1;
}

message Contestacao {
    repeated Anexo anexos = 1;
}

message Anexo {
    optional bytes anexo = 2;
}

formatFactory.java

FormatFactory formatFactory = new FormatFactory();
ProtobufFormatter formatter = formatFactory.createFormatter(FormatFactory.Formatter.XML_JAVAX);
InputStream in = TextUtils.toInputStream(paramString);
formatter.merge(in, this.builder);

I got a java.lang.RuntimeException: Can't get here. message exception at XmlJavaxFormat.java:566.

Change the formater, from XML_JAVAX to XML, I got this exception: com.googlecode.protobuf.format.ProtobufFormatter$ParseException: 4:22: Expected ">".

Request sending:

<ANEXOS>
    <anexos>
        <tipoAnexo>3</tipoAnexo>
        <descricao>foto frontal</descricao>
        <anexo><![CDATA[//]]></anexo>
    </anexos>
</ANEXOS>
scr commented 7 years ago

Would you mind making a pull request with a test that fails because of this?

bouviervj commented 6 years ago

This is the same issue as #44 , the tokenizer is too restrictive, and doesn't tolerate special chars in values, i.e. doesn't tokenize 'anexo' node content.

whiver commented 6 years ago

You can try to fix the regex used to match the next token, which is the core of the problem. It can be found here: https://github.com/bivas/protobuf-java-format/blob/091d247393772e94d64c2d8835ef4cedcdfc244e/src/main/java/com/googlecode/protobuf/format/XmlFormat.java#L320

But for now I could not manage to do it since making the regex more flexible often produces some side effects.

The best solution IMO should be to completely rewrite the XML parser using an existing one, which would be more reliable.