apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.5k stars 1.02k forks source link

regen.sh is failing in Ubuntu when working with recommended version of protoc #10979

Closed jcsherin closed 2 weeks ago

jcsherin commented 2 weeks ago

Describe the bug

Running regen.sh in datafusion-proto fails with the error:

$ ./regen.sh
Error: "protobuf compilation failed: protoc failed: datafusion/proto/proto/datafusion.proto: This file contains proto3 optional fields, but --experimental_allow_proto3_optional was not set.\n"

The error happens with libprotoc 3.12.4, which is the version recommended in the DataFusion docs. This is also the default version installed by apt in Ubuntu 22.04.

$ protoc --version
libprotoc 3.12.4

A similar bug was reported in #8602.

I needed to run regen.sh while trying to migrate string_agg, one of the built-in aggregate functions pending migration, listed in #8708.

Could you recommend an approach to fix this? I have listed 3 possible ways here.

1. Add flag

Satisfy the experimental check by passing --experimental_allow_proto3_optional to the protoc compiler invocation. I've verified locally that this will work.

// In datafusion/proto/gen/src/main.rs L31
prost_build::Config::new()
    .protoc_arg("--experimental_allow_proto3_optional")

2. Convert to oneof

Rewrite optional field as a oneof with a single field.

Currently, in datafusion.proto there are two usages of optional:

message ScalarUDFExprNode {  
  string fun_name = 1;  
  repeated LogicalExprNode args = 2;  
  optional bytes fun_definition = 3;  
}

message PhysicalScalarUdfNode {  
  string name = 1;  
  repeated PhysicalExprNode args = 2;  
  optional bytes fun_definition = 3;  
  datafusion_common.ArrowType return_type = 4;  
}

We can rewrite the optional field as a oneof with a single field like this:

message ScalarUDFExprNode {  
  string fun_name = 1;  
  repeated LogicalExprNode args = 2;  
  oneof fun_definition_opt {  
    bytes fun_definition = 3;  
  }  
}

This technique is already used in other message types in datafusion.proto like - WindowFrame, CsvScanExecNode, ProjectionNode etc.

I've tried this change in my fork and verified that it works. Please see the diff here

3. Remove optional

We could remove the optional keyword usage here, which would fix the issue. It might however have other cascading effects. To evaluate it, it might be useful to see #8706, where the message type in question was designed.

To Reproduce

You need protoc version 3.12.4.

Expected behavior

Running regen.sh should exit without errors when using protoc version 3.12.4.

Additional context

No response

jayzhan211 commented 2 weeks ago

I'm not sure why there is the error, but my version is libprotoc 27.0. 3.12.4 might be an outdated version 🤔

the version recommended in the DataFusion docs

3.12.4 is not the recommended version, but the minimum version then (probably not anymore given your error)

Could you try update it to the latest to see if there is still any issue?

jcsherin commented 2 weeks ago

Could you try update it to the latest to see if there is still any issue?

There is no compilation error when I tried with a different version: libprotoc 26.0.

From v3.15 there shouldn't be any compilation errors even when optional is used in the .proto source. I found this section in the protobuf field presence docs:

Presence tracking for proto3 messages is enabled by default since v3.15.0 release, formerly up until v3.12.0 the --experimental_allow_proto3_optional flag was required when using presence tracking with protoc.

How about changing the minimum required version in DataFusion docs source here from v3.12 to v3.15?

If acceptable, I can submit a doc patch for your review. We don't need to make any of the above suggested changes to code.