JuliaData / FlatBuffers.jl

A pure Julia implementation of google flatbuffers
https://juliadata.github.io/FlatBuffers.jl/stable
Other
43 stars 14 forks source link

v0.5.1 not handling unset fields correctly in table #37

Closed zhouyan closed 5 years ago

zhouyan commented 5 years ago

I encountered the problem when using Feather.jl after upgraded FlatBuffers.jl. Below is a minimal example that reproduce the issue.

Below is the fbs file,

table Foo
{
  foo:int64;
  bar:string;
}

Below is a simple python script that use generated python module to generate a binary file containing the flat buffer bytes (this can be done in Julia but here I am using python just to rule out possibility that the buffer is corrupted itself. In the place where I encountered the issue first it was generated by C++),

import Foo
import flatbuffers

builder = flatbuffers.Builder(0)
Foo.FooStart(builder)
Foo.FooAddFoo(builder, 1) 
# note that we only set the field foo, but not the string field bar
root = Foo.FooEnd(builder)
builder.Finish(root)
with open("test.fbb", "wb") as out:
    out.write(bytes(builder.Output()))

Below is the Julia test file,

using FlatBuffers

mutable struct Foo
    foo::Int64
    bar::String
end

foo = FlatBuffers.read(Foo, read("test.fbb"))
show(foo)

Blow is the commands

flatc --python -o . foo.fbs
python3 test.py
julia test.jl

When using v0.4.0, it gives the expected result,

Foo(1, "")

But after upgrading, it gives the error,

ERROR: LoadError: MethodError: Cannot `convert` an object of type Nothing to an object of type String
Closest candidates are:
  convert(::Type{T<:AbstractString}, !Matched::T<:AbstractString) where T<:AbstractString at strings/basic.jl:207
  convert(::Type{T<:AbstractString}, !Matched::AbstractString) where T<:AbstractString at strings/basic.jl:208
  convert(::Type{T}, !Matched::T) where T at essentials.jl:154
Stacktrace:
 [1] Foo(::Int64, ::Nothing) at /Users/zhou/Desktop/test.jl:4
 [2] read(::FlatBuffers.Table{Foo}, ::Type{Foo}) at /Users/zhou/.julia/packages/FlatBuffers/YAnlP/src/FlatBuffers.jl:320
 [3] read at /Users/zhou/.julia/packages/FlatBuffers/YAnlP/src/FlatBuffers.jl:300 [inlined]
 [4] read(::Type{Foo}, ::Array{UInt8,1}) at /Users/zhou/.julia/packages/FlatBuffers/YAnlP/src/FlatBuffers.jl:326
 [5] top-level scope at none:0
 [6] include at ./boot.jl:317 [inlined]
 [7] include_relative(::Module, ::String) at ./loading.jl:1044
 [8] include(::Module, ::String) at ./sysimg.jl:29
 [9] exec_options(::Base.JLOptions) at ./client.jl:231
 [10] _start() at ./client.jl:425
in expression starting at /Users/zhou/Desktop/test.jl:8

A workaround would be to define a constructor for Foo that accept Nothing. However, for types with many fields, this fast become infeasible. Besides, this change breaks too much existing codebase.

Is this an intentional break? If yes, would you consider to change it back?

rjkat commented 5 years ago

Hi,

No this is not intentional. However, if you have an fbs file, why not generate the Julia from it as well instead of writing it by hand? The whole point of the 0.5 release is to support this.

You can find a fork of Flatbuffers with Julia in flatc here.

Unless there is a compelling reason why you can't use the Julia flatc backend in this case the behaviour probably won't be changed back.

zhouyan commented 5 years ago

First, thanks for the effort of developing flatc support. It is really great.

But there are a few reasons not to use the flatc at least for now,

  1. FlatBuffers are meant to be a cross-language format. We use it across C++, Python, Julia. As it is not merged into the upstream yet, it is not really suitable for use at the moment in production.
  2. We already has a sizable body of FlatBuffers code (~10K lines) and corresponding Julia types defined for using with FlatBuffers.jl, developed over the course of the past year. It will be better to be able to upgrade gradually instead of change everything all at once. It is really not an option in a production environment.
  3. It breaks too much code, not just my own code but packages like Feather.jl. Maybe they will upgrade eventually to be compatible, but there's no telling when.

If it is not too difficult, maintaining backward compatibility would be great. It allows people to migrate gradually instead of risking breaking critical code. For example, at the moment I have no choice but to pin it to v0.4.0, even in testing environment. However, it is much preferable to be able to upgrade, and migrate little by little. Otherwise it leaves people two choices:

  1. Pin to old version forever until it become too old to satisfy requirement by some other package. Basically force people into the classical dependency hell
  2. Upgrade all the code now and risk some break that cannot be easily fixed.

In the end, it is more of a chicken egg problem. If I can upgrade the package first, then upgrading code base using flatc gradually, it makes the upgrading much more desirable.

rjkat commented 5 years ago

Cool, thanks for the explanation, and I can certainly understand your frustration. There is a PR open on the upstream Flatbuffers project, which I'm hoping can be merged in the coming weeks, so hopefully you shouldn't have to wait too long on this.

The motivation for this change is that a lack of a string field is different from an empty string, and the new way is more in line with the existing Flatbuffers code. This helps with getting the PR through :)

I'm not aware of any current breakages in Feather.jl - can you elaborate on this? I'm happy to update Feather.jl for compatibility.

How difficult would it be to run the experiment of upgrading everything and seeing what breaks? This would be valuable since I don't have a codebase of that size to work with. Feel free to post any problems here and I'll do my best to help.

Otherwise, yes unfortunately until it's merged the best thing to do would be to pin the package and wait.

zhouyan commented 5 years ago

For feather the issue is with Timestamp type metadata, which has a time zone file which maybe empty and missing in the feather file written by other languages (tried Python and C++). Though feather written by a Julia itself can be read back properly.

Thanks a lot if it can be updated.

It will of course be great to have the PR for upstream merged soon. But it does not solve all problems yet. By switching to flatc it means a lot types need to be changed. Some may have a different name than it has been defined now. That is the most difficult part to update. A simple example, one of the flatbuffers type we have has a field named begin, and in Julia the corresponding field is named _begin. I guess the flatc generated code will have similar convention to avoid conflict with keywords. But it may or may not be the same as the convention we use now, this means a lot places where the generated type is used will need code to be updated to use new field name. Unlike say C++, one can make the change and just fix all compile errors, in a dynamic language like a Julia, such changes may introduce errors not detected until runtime. Of course code coverage and unit tests help, but it is difficult to provide 100% coverage.

It is small cases like this making upgrading risky and needing careful planing and thus why backward compatibility is invaluable

(Of course in the specific example above, it may seems silly to have a field named begin to begin with. But remember that flatbuffer is a cross language format, the schema has been used with C++ and Python long before we started adding Julia support, and changing schema just for one language is not an option unless I want everyone to start shouting at me (e.g., the guy responsible for the Python codebase). And it is difficult to avoid all keys from all languages)

In the more specific case of string, on one hand I agree a missing string is not the same as an empty string. On the hand, within the context of flatbuffer I think it does default to empty string, or at least this seems to be how it is handled in other languages. Another way to see it is that every field need to be initialized by a default value if not presented in the buffer, so they need a sensible default unless specified otherwise. For string, the sensible one I believe is empty string.

Sent from my iPad

On Dec 31, 2018, at 05:14, Rowan notifications@github.com wrote:

Cool, thanks for the explanation, and I can certainly understand your frustration. There is a PR open on the upstream Flatbuffers project, which I'm hoping can be merged in the coming weeks, so hopefully you shouldn't have to wait too long on this.

The motivation for this change is that a lack of a string field is different from an empty string, and the new way is more in line with the existing Flatbuffers code. This helps with getting the PR through :)

I'm not aware of any current breakages in Feather.jl - can you elaborate on this? I'm happy to update Feather.jl for compatibility.

How difficult would it be to run the experiment of upgrading everything and seeing what breaks? This would be valuable since I don't have a codebase of that size to work with. Feel free to post any problems here and I'll do my best to help.

Otherwise, yes unfortunately until it's merged the best thing to do would be to pin the package and wait.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

rjkat commented 5 years ago

Leave it with me. The use case of combining flatc generated code with handwritten Julia is certainly one for which we don't have any tests, hence the unintentional breakage. Seems like there is a similar lack of cross-language tests in Feather.jl since we are not failing tests there either.

It may be a trivial change to move back to the old behaviour, or there may be other complications. I'm away on holiday at the moment but I'll investigate in the new year and post an update here.

zhouyan commented 5 years ago

Great, thanks a lot

Sent from my iPad

On Dec 31, 2018, at 10:13, Rowan notifications@github.com wrote:

Leave it with me. The use case of combining flatc generated code with handwritten Julia is certainly one for which we don't have any tests, hence the unintentional breakage. Seems like there is a similar lack of cross-language tests in Feather.jl since we are not failing tests there either.

It may be a trivial change to move back to the old behaviour, or there may be other complications. I'm away on holiday at the moment but I'll investigate in the new year and post an update here.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

rjkat commented 5 years ago

Just released 0.5.2, which reverts back to the old behaviour. Hopefully this fixes your issue :)

zhouyan commented 5 years ago

Thanks a lot. I have just come back from holiday and tested the new version, all things to work fine now and I will start testing the flatc support on development branches and keep you updated if any issues happens.