dmendel / bindata

BinData - Reading and Writing Binary Data in Ruby
BSD 2-Clause "Simplified" License
580 stars 55 forks source link

Multiple registries #157

Closed franzliedke closed 7 months ago

franzliedke commented 8 months ago

Thanks for this great library, which is fundamental to our business! πŸ’š

As described in #69, having a flat registry namespace was a design decision. That's okay to me. What is problematic to us, though: the registry is a singleton / global.

We already see several console warnings because structs / records live in multiple Ruby namespaces and share the same class name, therefore overwrite each other in the registry. I assume that we currently avoid observable problems by explicitly loading only the records we need.

Of course, this is error-prone and bound to cause obscure runtime problems, also in relation to things like eager loading code. Maybe it hasn't happened yet because the code is only loaded in Rake tasks (i.e. short-lived processes).

The current situation works, kinda. A better fix would be to make sure the names of all records are unique. (Or your suggested workaround from #132.) However, even then a problem remains: We use bindata for parsing multiple formats, which are unrelated to each other. Having a shared registry means our parsers can (accidentally?) reference each other's datatypes. This problem is made worse by the flat namespace, but strictly speaking unrelated.

Would you be open to accept a PR? I'd be happy to spend time on supporting multiple / custom registries.

franzliedke commented 8 months ago

P.S.: It's likely I'm imagining this to be easier than it is. I'd be happy about any pointers to things that could cause difficulty.

dmendel commented 8 months ago

BinData uses namespaces and search_prefix to handle collisions. https://github.com/dmendel/bindata/wiki/AdvancedTopics#namespaces

Is there a reason that this doesn't work for you (other than ugliness)?

If you have a better proposal for handling collisions, then yes I would like to know about it. Multiple registries could be a way to prevent collisions, but I'd like to see your proposed interface first.

If your are eager loading, you should prevent a BinData definition being loaded multiple times into the same ruby process. Consider wrapping your BinData declarations inside something like:

unless defined? MyFileLoaded
   MyFileLoaded = true

   class Foo < BinData::Record
     ...
   end
end
grk commented 8 months ago

We'd like to structure our code in the following way:

SomeProtocol::VersionA::Component
SomeProtocol::VersionB::Component
# and that might even collide with
OtherStuff::Component

Ideally we'd like to have a registry at SomeProtocol::VersionA, another one at SomeProtocol::VersionB and another at OtherStuff.

Also I think if we used any gem that uses BinData under the hood, we'd have to be careful not to collide with its namespace?

dmendel commented 8 months ago

How would you instantiate different versions?

e.g.

if stream_is_version_a
   # read version A
   data = Component.read(stream)
else
   # read version B
   data = Component.read(stream)
end
grk commented 8 months ago

I have two ideas in mind. Either register intermediate superclasses:

class SomeProtocol::VersionA::Record < BinData::Record
  registry_root!
end

and then inherit from that:

class SomeProtocol::VersionA::Component < SomeProtocol::VersionA::Record

or a class method on each record:

class SomeProtocol::VersionA::Component < BinData::Record
  use_registry :some_protocol_version_a
end

Either solution would allow us to avoid doing this:

class SomeProtocol::VersionA::SomeProtocolVersionAComponent < BinData::Record
dmendel commented 8 months ago

I don't think multiple registries is a good idea, as they preclude a user using code from multiple registries simultaneously.

With your example of multiple versions, an obvious use case would be a parsing function that can dynamically switch on a version id.

Is there a problem with BinData's namespace feature that make it unsuitable for your needs?

franzliedke commented 8 months ago

Thanks for chiming in with examples, @grk!

I don't think multiple registries is a good idea, as they preclude a user using code from multiple registries simultaneously.

If we build this feature in a way that is opt-in, then it's backward-compatible and doesn't preclude anyone from doing anything that was possible before.


Is there a problem with BinData's namespace feature that make it unsuitable for your needs?

The problem with the global, shared registry is that it's global, shared state. The existing workaround for namespacing (you're referring to search_prefix, right?) makes it harder to separate code that does not belong together. It's possible, as the Ruby namespaces (and therefore conventional directory structure) don't have an impact on the internal naming, but that increases the risk of accidentally creating conflicts.

Again, I think and hope we can make this opt-in, so you can keep the old convention-based behavior for simple cases, and for those who prefer it, separated namespaces can be enabled at the price of a somewhat more verbose, but also more explicit definition.


P.S.: "using code from multiple registries simultaneously" feels a bit made up, but it could even be enabled with a way to "import" names from multiple registries, and giving them a prefix when importing.

franzliedke commented 8 months ago

I spent some time prototyping a possible solution to this. The main goal was to enable opt-in registry namespacing / multiple registries, while being backward-compatible, i.e. not requiring any changes whatsoever to keep using a shared global registry.

Changing the internals of the gem to prepare for a dynamic lookup of each class' registry was relatively straigtforward. But it turns out there is one big roadblock when it comes to actually letting a class define its own registry, and it's due to this snippet in the current auto-self-registration code:

https://github.com/dmendel/bindata/blob/8244a1c7e7b24617adcd7264e02370663bd03621/lib/bindata/base.rb#L53-L56

The inherited hook on BinData::Base is executed as soon as a new child class is defined, but before its body is evaluated. This means that any code like what @grk proposed above runs too late to prevent the class from being registered globally (which is the thing causing conflicts and printing warnings):

class SomeProtocol::VersionA::Component < BinData::Record
  use_registry :some_protocol_version_a
end

The problem is explained in more detail in this article. The author also suggests a possible solution in a follow-up. That would look like this:

class SomeProtocol::VersionA::Component < BinData::Record[:some_protocol_version_a]
  # ...
end

@dmendel What do you think about this API? I am very willing to spend more time on this, but only if there's at least a small chance of the feature being accepted. 😊


One possible alternative I could think of is switching from self-registration (each child class registers itself with the registry once defined) to resolving (and caching?) all existing BinData::Record child classes when looking up a type for the first time.

This should work in a backward-compatible way as well and would allow for more freedom in designing an API for this feature, with two possible downsides:

dmendel commented 8 months ago

Just a quick note. I am investigating how to achieve this, but within a single registry. A multiple registry approach will not be accepted as combining definitions from different modules is a requirement.

If you are looking at a multiple registry solution, I recommend something like the following API. Here ::isolate duplicates the existing registry at the start of the block and restores the registry at the end of the block.

BinData.isolate do
  class A < BinData::Record
    ...
  end
  a = A.new
end
franzliedke commented 8 months ago

Thanks for the feedback.

Just a quick note. I am investigating how to achieve this, but within a single registry.

How about limiting the scope to registering classes with a custom prefix then? With the API I proposed, this could look like the following:

class MyVendor::FooBar::Number < BinData::Record[:my_proto]
  # The class 
end

This seems like a nice extension to the existing search_prefix feature: defining a prefix for registration, not for lookup.


If you'd prefer a class-method DSL, I realized that it's probably possible to implement this, too, without many changes:

class MyVendor::FooBar::Number < BinData::Record
  # A prefix
  prefix :my_proto # => lookup as my_proto_number

  # Or even a random name
  register_as :my_thing # => lookup as my_thing
end

This could be implemented by first unregistering the default name and then registering again with the new name. This should get rid of the warnings, but it feels quite like a hack. πŸ€”

dmendel commented 7 months ago

Here is my summary of your issue. Let me know if I misunderstand.

While developing or integrating another codebase, you are receiving warnings such as "warning: replacing registered class ModA::Foo with ModB::Foo".

You are looking for a solution to prevent this class of errors.


The BinData solution below is unacceptable to you:

module ModA
  class ModAFoo < BinData::Record
    search_prefix :mod_a
  end

  class ModABar < BinData::Record
    search_prefix :mod_a
    foo :my_foo
  end
end

But this is acceptable to you

module ModA
  class Record < BinData::Record
    registry_root!
  end

  class Foo < Record
  end

  class Bar < Record
    foo :my_foo
  end
end

As is this

module ModA
  class Foo < BinData::Record[:mod_a]
  end

  class Bar < BinData::Record[:mod_a]
    foo :my_foo
  end
end

And this

module ModA
  class Foo < BinData::Record
    use_registry :mod_a
  end

  class Bar < BinData::Record
    use_registry :mod_a
    foo :my_foo
  end
end
dmendel commented 7 months ago

Here is an example of how a large project solves this problem using BinData namespaces.

https://github.com/rapid7/metasploit-framework/blob/master/lib/rex/proto/amqp/version_0_9_1/types.rb https://github.com/rapid7/metasploit-framework/blob/master/lib/rex/proto/amqp/version_0_9_1/frames/method_arguments.rb


Your proposed BinData::Record[:mod_a] technique assumes that user-defined BinData types are all subclasses of BinData::Record. This is not the case.

This technique needs a way to automatically create such methods for all user-defined BinData types.


If you can give a reason for why the existing BinData namespacing is insufficient I will revisit this.

Otherwise, large projects are successfully handling the namespacing problem so I am loathe to introduce a new way of doing things without a clear reason as to why the new method is needed.