GobySoft / dccl

Dynamic Compact Control Language
Other
17 stars 13 forks source link

presence-bit codec prototype #58

Closed kguilbert closed 5 years ago

kguilbert commented 5 years ago

@tsaubergine: @chrismurf and myself have created a new "presence bit codec" which instantiates an existing codec and encodes optional empty fields using a single bit. It's particularly intended for numeric types but could also be used for strings. Looking for your recommendation on the best way to contribute this feature upstream. Here are the two primary questions/concerns as I see them:

  1. I decided to create an instance of the "wrapped" codec in PresenceBitCodec to re-use existing codecs. In order to defer to those codecs' methods, e.g. size(), encode(), decode(), I had to change a few private or protected methods to public. This transformation is incomplete, and so the new codec can't yet be used for codecs having those methods private or protected.
  2. Because PresenceBitCodec simply defers to the "wrapped" codec, which is usually DefaultNumericFieldCodec, it inherits the "presence value" used by DefaultNumericFieldCodec, which sometimes results in wasting an extra bit if log2(max-min) is exactly a power of 2. We've been thinking of ways to get around this, none of which are particularly attractive (for example, recreating the code in DefaultNumericFieldCodec or even pushing a fake field descriptor on the stack to trick the size() method).
tsaubergine commented 5 years ago

Definitely agree this is a good addition - I've been meaning to add this for a while, so thanks!

I like the implementation, as it allows this to be quickly extended to other codecs when it makes sense (this PresenseCodec and the default codec are then the two extremes of the ArithmeticCodec: very high probability of the field being unset versus equal probability of the field being unset as any other symbol, respectively).

The string codec probably won't be one of them as this needs to be fixed anyway: https://github.com/GobySoft/dccl/issues/31 and my proposed fix would treat the string basically t same way that a repeated field of int with min: 0, max: 255 (char) is encoded.

The "use_required()" method already exists in FieldCodecBase that contains logic to determine if the required encoding or optional encoding should be used.

Perhaps something like this would work:

diff --git a/src/field_codec.cpp b/src/field_codec.cpp
index 697d8967..99e52826 100644
--- a/src/field_codec.cpp
+++ b/src/field_codec.cpp
@@ -39,7 +39,7 @@ using namespace dccl::logger;
 //
 // FieldCodecBase public
 //
-dccl::FieldCodecBase::FieldCodecBase() { }
+dccl::FieldCodecBase::FieldCodecBase() : force_required_(false) { }

 void dccl::FieldCodecBase::base_encode(Bitset* bits,
                                        const google::protobuf::Message& field_value,
diff --git a/src/field_codec.h b/src/field_codec.h
index cb7ca93b..a53e007a 100644
--- a/src/field_codec.h
+++ b/src/field_codec.h
@@ -344,9 +344,18 @@ namespace dccl

         }

+        /// \brief Force the codec to always use the "required" field encoding, regardless of the FieldDescriptor setting. Useful when wrapping this codec in another that handles optional and repeated fields
+        void set_force_use_required(bool force_required = true)
+        {
+            bool force_required_ = force_required;
+        }
+        
         /// \brief Whether to use the required or optional encoding
         bool use_required()
         {
+            if(force_required_)
+                return true;
+
             const google::protobuf::FieldDescriptor* field = this_field();
             if(!field)
                 return true;
@@ -495,6 +504,7 @@ namespace dccl
         google::protobuf::FieldDescriptor::Type field_type_;
         google::protobuf::FieldDescriptor::CppType wire_type_;

+        bool force_required_;
     };

     inline std::ostream& operator<<(std::ostream& os, const FieldCodecBase& field_codec )

and then in the constructor for PresenceBitCodec you can set _inner_codec.set_force_use_required(true);

Finally, I have the Contributor License Agreement modification from APS last November, so could you also please add your name and email to the AUTHORS file as a commit on this PR?

Thanks again.

tsaubergine commented 5 years ago

Thanks, looks good to me once the unit test passes.

kguilbert commented 5 years ago

Looks like we're all good now. Thanks for the help @tsaubergine.

tsaubergine commented 5 years ago

Great, thanks for this useful contribution.