amazon-ion / ion-java

Java streaming parser/serializer for Ion.
https://amazon-ion.github.io/ion-docs/
Apache License 2.0
864 stars 110 forks source link

Binary Writer Does Not Support Non-Strings in LST Symbols Through Direct Write #44

Open almann opened 8 years ago

almann commented 8 years ago

There is a test that plumbs non-strings into the LST of a binary writer and expects them to go through. The binary writer actually ignores this for now and it isn't super clear what the API contract must be (code plumbing garbage through the LST is most certainly not doing something right).

We should make sure that the code is doing the right thing with respect to the tests or fix the tests.

Imported from ION-606

tgregg commented 7 years ago

According to the spec, for both LSTs and Shared STs, "Any element of the [symbols] list that is not a string must be interpreted as if it were null."

The test at https://github.com/amzn/ion-java/blob/b70dae2f2a3e72bf57b78f0dfdb921ab336cd08d/test/software/amazon/ion/SystemProcessingTestCase.java#L1157 verifies this, but doesn't run because it is ignored. We should determine why it is ignored and fix that.

raganhan commented 7 years ago

The test is passing in master. Since it's testing expected behavior I submitted https://github.com/amzn/ion-java/pull/110 to remove the ignore

ref: http://amzn.github.io/ion-docs/symbols.html

Any element of the list that is not a string must be interpreted as if it were null.

tgregg commented 7 years ago

I think there's still an open question as to how this should be handled on the write side. Are there symbol table-building APIs that should enforce the symbols list schema, or should we allow open content through all APIs?

raganhan commented 7 years ago

Fair point.

The spec doesn't restrict the malformed element behavior to read or write so for consistency non string symbols should be interpreted as null in both, i.e. we should allow open content through all APIs.

However, IMHO, the spec breaks the principle of least astonishment. Converting one of the malformed entries into nulls, specially in writes, is not what I'd expect to happen.

A way to get around it in strongly typed languages is to enforce string only on the API itself, but not ideal as you are not allowing behavior that is defined by the spec

tgregg commented 7 years ago

I don't think it's necessary for the writers to do any conversion. It can leave the content as-is, knowing that a correctly-implemented reader will interpret the non-conforming entries as nulls (in accordance with the spec).

That said, I don't think making the writer more restrictive, should we choose to, conflicts with the spec. After all, using a writer is not the only way to generate Ion data (e.g. it can be written by hand). The spec merely defines non-conforming entries to be valid Ion. But there are plenty of instances of valid Ion that our writers will never emit (e.g. overpadded binary representations, numbers with underscores, etc).