FasterXML / jackson-dataformats-binary

Uber-project for standard Jackson binary format backends: avro, cbor, ion, protobuf, smile
Apache License 2.0
310 stars 133 forks source link

Usage for parsing binary-mode Ion file #297

Closed occia closed 3 years ago

occia commented 3 years ago

Hi guys, is there any API for parsing binary-mode Ion file? I tried the following code but it only worked well in text-mode Ion input.

ObjectMapper mapper = new IonObjectMapper();
mapper.readTree("{a : 1}");

If I feed it an binary input, it will raise the following exception:

# content of input binary ion file:
# hexdump -C ../test-bi.ion
00000000  e0 01 00 ea e8 81 83 de  84 87 b2 81 61 de 83 8a  |............a...|
00000010  21 01                                             |!.|
# the exception I got
Exception in thread "main" com.amazon.ion.impl.IonReaderTextRawTokensX$IonReaderTextTokenException: bad character [65533, "\ufffd"] encountered where a token was supposed to start  at line 1 offset 2
        at com.amazon.ion.impl.IonReaderTextRawTokensX.bad_token_start(IonReaderTextRawTokensX.java:2706)
        at com.amazon.ion.impl.IonReaderTextRawTokensX.nextToken(IonReaderTextRawTokensX.java:716)
        at com.amazon.ion.impl.IonReaderTextRawX.parse_to_next_value(IonReaderTextRawX.java:797)
        at com.amazon.ion.impl.IonReaderTextRawX.has_next_raw_value(IonReaderTextRawX.java:464)
        at com.amazon.ion.impl.IonReaderTextUserX.has_next_user_value(IonReaderTextUserX.java:124)
        at com.amazon.ion.impl.IonReaderTextUserX.hasNext(IonReaderTextUserX.java:110)
        at com.amazon.ion.impl.IonReaderTextRawX.next(IonReaderTextRawX.java:486)
        at com.fasterxml.jackson.dataformat.ion.IonParser.nextToken(IonParser.java:506)
        at com.fasterxml.jackson.databind.ObjectMapper._readTreeAndClose(ObjectMapper.java:4649)
        at com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:3060)
        ...

Thanks~

mcliedtke commented 3 years ago

What version are you testing against and can you share the exact code sample you are using?

Converting the hex data you provided back to binary, storing that in a file, and reading it via the following code snippet causes no issues for me on 2.12.3

IonObjectMapper mapper = IonObjectMapper.builder().build();
final JsonNode node = mapper.readTree(new File("sample.bin"));
System.out.println(node); // {"a":1}
occia commented 3 years ago

Hi, my version is 2.13.0-rc2. I tried your shared code and it works well. The key difference between mine and yours is I pass a String rather than File into the readTree. Do you think this is a correct behaviour of overloaded readTree functions? The full code is listed here:

        private static String readFile(String path, Charset encoding) {
                try {

                        byte[] encoded = Files.readAllBytes(Paths.get(path));
                        return new String(encoded, encoding);

                } catch (IOException e) {

                        System.out.println("read file content error: " + e);
                        System.exit(1);
                        return "";

                }
        }

        public static void main(String[] args) {
                String content = readFile(args[0], StandardCharsets.UTF_8);

                ObjectMapper mapper = new IonObjectMapper();
                try {
                        // both the following 2 do not work  
                        mapper.readTree(content.getBytes());
                        mapper.readTree(content);
                } catch (IOException ignored1) {
                }
        }
mcliedtke commented 3 years ago

I guess my question is why read the data to a String? It's binary data, why not use the byte[] directly?

It doesn't seem like encoding to a String is loss-less, so I don't think this is a Jackson issue.

byte[] ionBinaryData = Files.readAllBytes(new File("output.bin").toPath());
String stringData = new String(ionBinaryData, StandardCharsets.UTF_8);
byte[] dataAgain = stringData.getBytes(StandardCharsets.UTF_8);
System.out.println(Arrays.equals(ionBinaryData, dataAgain)); //false
occia commented 3 years ago

Get it, thank you for the help~