options for avoiding name collisions

ExpandingMan commented 2 years ago

Most name collisions aren't really a problem here thanks to modules, but conflicts with Base can still be a problem (for example names like Vector and Array).

One simple solution to this would be to provide the option to prepend all generated types names with a user-provided prefix. For example with protojl(files, dir, outdir, type_prefix="Pr") a message called Vector in Julia code would be called PrVector.

Drvi commented 2 years ago

Yeah, the list of reserved names used to be very long, I even considered including all exported names from Base. At some point we trimmed it back, don't really remember why, probably for cosmetic reasons:) So, good you bring this up.

As for custom type prefixes, there is AFAIK only one problem -- the Any (https://github.com/JuliaIO/ProtoBuf.jl/issues/194#issuecomment-1221317906) protobuf type contains two fields: the qualified name of the type (a "type_url") and then the binary payload to be decoded into that type. When we decode an Any field, since we only get the type name, we need to parse it and look it up among the ProtoBuf modules that are available in the session. For this reason, I prefer type names that are known statically -- when you get a message with an Any field, you don't need to provide more runtime info to the decode method to make it work.

My preferred option would be to expand the list of reserved names until we catch all of those that cause trouble (it shouldn't be a breaking change, since we're fixing something in that case).

Thoughts?

ExpandingMan commented 2 years ago

Just want to make sure I'm understanding this correctly: the serialization algorithm needs to know the type names (in addition to the field names)?

The current behavior of adding # seems a bit awkward to deal with, unless I'm missing something. In my opinion the ideal solution would be to decouple the Julia type names from the protobuf names which I think could be done by adding a method, though I can see how this might be an onerous solution.

I might fool around with this solution later this evening when I go to update the EnumX import, just to see how complicated it would be, let me know whether you agree that it is desirable. A way of "cheating" which might be too hackish would be to leave the names as is but allow an option to create aliases for them via module-level const statements.

Otherwise, yes of course we should add Base type names to the reserved list, as those really do cause bugs since the generated code itself tries to use some of these names (e.g. Vector). I can take care of this later as well.

Drvi commented 2 years ago

the serialization algorithm needs to know the type names (in addition to the field names)?

For the proto Any type, yes, we need to know the name of the type you are serializing (and if you are sending Any messages between different languages or runtimes, the names must be consistent for this to work).

The current behavior of adding # seems a bit awkward to deal with

Yeah... The nice thing about prefixing with # is that the resulting name cannot collide with any other valid ProtoBuf definition (and maybe a smaller chance it would collide with another Julia name).

In my opinion the ideal solution would be to decouple the Julia type names from the protobuf names

I mean, if it was completely unconditional (i.e. we'd prefix every definition with statically known prefix), then we could just always strip it before sending the Any type and always prepend the parsed name when decoding... (we'll have to do similar thing with the # prefix). But I don't think I understand why we'd want the Julia vs ProtoBuf names to be different (apart from avoiding collisions with Base)?

Anyway, I think I'd prefer us expanding the list of reserved keywords... cc: @quinnj

ExpandingMan commented 2 years ago

I think what you just said about why to use # pretty much says it all about why the names should be decoupled.

No worries, I'll add some keywords later and we'll call it a day.

Also once collisions are fixed, there's nothing stopping users from creating their own aliases like I suggested, so I guess there's not really anything that has to be done in the package itself.

JuliaIO / ProtoBuf.jl

options for avoiding name collisions #220