Unidata / netcdf-java

The Unidata netcdf-java library
https://docs.unidata.ucar.edu/netcdf-java/current/userguide/index.html
BSD 3-Clause "New" or "Revised" License
145 stars 70 forks source link

Enum naming in protobuf files; conflicts when auto generating proto class files. #627

Closed lesserwhirls closed 3 years ago

lesserwhirls commented 3 years ago

The protobuf-gradle-plugin allows for the generation of source code using the .proto files shipped with an artifact. The following build.gradle demonstrates how this is done using the cdmr artifact:

plugins {
  id 'java'
  id 'com.google.protobuf' version '0.8.15'
}

repositories {
  mavenCentral()
  maven {
    url "https://artifacts.unidata.ucar.edu/repository/unidata-all/"
  }
}

dependencies {
  protobuf 'edu.ucar:cdmr:6.0.0-SNAPSHOT'
  implementation "io.grpc:grpc-protobuf:1.31.1"
  implementation "io.grpc:grpc-stub:1.31.1"
}

protobuf {
  protoc {
    artifact = 'com.google.protobuf:protoc:3.12.4'
  }
  plugins {
    grpc {
      artifact = 'io.grpc:protoc-gen-grpc-java:1.31.1'
    }
  }
  generateProtoTasks {
    all().each { task ->
      task.plugins {
        grpc {}
      }
    }
  }
}

This will not only extract the .proto files from edu.ucar:cdmr artifact (stored in the top level of the jar file):

cdmr.jar/
    META-INF/
    ucar/
    cdmrGrid.proto
    cdmrNetcdf.proto
    cdmrServer.proto
    logging.properties

but it will also extract the .proto files from the dependencies of the artifact. Currently, this process results in the extraction and attempted compilation of the following .proto files:

bufrCdmIndex.proto
cdmrfeature.proto
cdmrGrid.proto
cdmrNetcdf.proto
cdmrServer.proto
grib1Index.proto
grib2Index.proto
gribCollection4.proto
ncStream.proto
pointStream.proto

Using build.gradle from above, running gradle clean build results in the following error:

Failure:

> Task :generateProto FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':generateProto'.
> protoc: stdout: . stderr: cdmrfeature.proto:82:3: "none" is already defined in file "bufrCdmIndex.proto".
  cdmrfeature.proto:82:3: Note that enum values use C++ scoping rules, meaning that enum values are siblings of their type, not children of it.  Therefore, "none" must be unique within the global scope, not just within "Calendar".

We might be able to get things working by changing the dependency configuration on the bufr and grib subprojects from api to implementation (not exactly sure if/how protobuf-gradle-plugin proto file extraction code differentiates api and implementation level dependencies).

JohnLCaron commented 3 years ago

I havent been able to reproduce this yet, so Im not sure how extensive it is. We should be able to solve it if its just in cdmrfeature, which is going away. But we do need to review all protobuf enums.

JohnLCaron commented 3 years ago

legacy indexes - update in 7?

================bufrCdmIndex.proto
enum FldAction {
  defa = 0;
  none = 1;
  remove = 2;
  asMissing = 3;
  asArray = 4;
  concat = 5;
}
enum FldType {
  def = 0;
  lat=1;
  lon=2;
  height=3;
  heightAboveStation=4;
  heightOfStation=5;
  stationId=10;
  stationDesc=11;
  wmoId=12;
  wmoBlock=13;
  year=15;
  month=16;
  day=17;
  hour=18;
  minute=19;
  sec=20;
  doy=21;
  timeIncr=22;
  incrS=23;
}
================gribCollection4
 enum GribAxisType {  // same as ucar.coord.Coordinate.Type
    runtime=0;
    time=1;
    timeIntv=2;
    vert=3;
    time2D=4;
    ens=5;
 }
JohnLCaron commented 3 years ago

legacy - abandon

============cdmrFeature.proto
enum AxisType {  // same as ucar.nc2.constants.AxisType
  RunTime=0;
  Ensemble=1;
  Time=2;
  GeoX=3;
  GeoY=4;
  GeoZ=5;
  Lat=6;
  Lon=7;
  Height=8;
  Pressure=9;
  RadialAzimuth=10;
  RadialDistance=11;
  RadialElevation=12;
  Spectral=13;
  TimeOffset=14;
}
enum AxisSpacing {  // same as CoverageCoordAxis.Spacing
  regularPoint=0;
  irregularPoint=1;
  contiguousInterval=2;
  discontiguousInterval=3;
  regularInterval=4;
}
enum DependenceType {  // same as CoverageCoordAxis.DependenceType
  independent=0;     // has its own dimension, is a coordinate variable, eg x(x)
  dependent=1;       // aux coordinate, reftime(time) or time_bounds(time);
  scalar=2;          // reftime
  twoD=3;            // lat(x,y)
  fmrcReg=4;         // time(reftime, hourOfDay)
}
enum Calendar {  // same as ucar.nc2.time.Calendar
  proleptic_gregorian=0;
  gregorian=1;
  noleap=2;
  all_leap=3;
  uniform30day=4;
  julian=5;
  none=6;
}
enum CoverageType {
  General=0;
  Curvilinear=1;
  Grid=2;
  Swath=3;
  Fmrc=4;
}
===================ncStream (legacy abandon)
 enum DataType {
  CHAR = 0;
  BYTE = 1;
  SHORT = 2;
  INT = 3;
  LONG = 4;
  FLOAT = 5;
  DOUBLE = 6;
  STRING = 7;
  STRUCTURE = 8;
  SEQUENCE = 9;
  ENUM1 = 10;
  ENUM2 = 11;
  ENUM4 = 12;
  OPAQUE = 13;

  UBYTE = 14;
  USHORT = 15;
  UINT = 16;
  ULONG = 17;
}
enum Compress {
  NONE = 0;
  DEFLATE = 1;
}
JohnLCaron commented 3 years ago

cdmr - change now

===============cdmrGrid.proto (change now)
enum AxisType {  // same as ucar.nc2.constants.AxisType
  RunTime=0;
  Ensemble=1;
  Time=2;
  GeoX=3;
  GeoY=4;
  GeoZ=5;
  Lat=6;
  Lon=7;
  Height=8;
  Pressure=9;
  TimeOffset=14;
}
enum AxisSpacing {  // same as GridAxis.Spacing
  regularPoint=0;
  irregularPoint=1;
  regularInterval=3;
  contiguousInterval=4;
  discontiguousInterval=5;
}
enum Calendar {  // same as ucar.nc2.time.Calendar
  proleptic_gregorian=0;
  gregorian=1;
  noleap=2;
  all_leap=3;
  uniform30day=4;
  julian=5;
  none=6;
}
enum DependenceType {  // same as GridAxis.DependenceType
  independent=0;     // has its own dimension, is a coordinate variable, eg x(x)
  dependent=1;       // aux coordinate, reftime(time) or time_bounds(time);
  scalar=2;          // reftime
  twoD=3;            // lat(x,y)
  fmrcReg=4;         // time(reftime, hourOfDay)
  dimension=5;       // swath(scan, scanAcross)
}
enum FeatureType {
  General=0;
  Curvilinear=1;
  Gridded=2;
  Swath=3;
  Fmrc=4;
}
enum GridAxisType {
  Axis1D=0;
  Axis1DTime=1;
  TimeOffsetRegular=2;
  Axis2D=3;
}
================cdmrNetcdf.proto
enum DataType {
  UNDEFINED = 0;
  BYTE = 1;
  SHORT = 2;
  INT = 3;
  LONG = 4;
  FLOAT = 5;
  DOUBLE = 6;
  STRING = 7;
  STRUCTURE = 8;
  SEQUENCE = 9;
  ENUM1 = 10;
  ENUM2 = 11;
  ENUM4 = 12;
  OPAQUE = 13;

  UBYTE = 14;
  USHORT = 15;
  UINT = 16;
  ULONG = 17;

  CHAR = 18; // prefer String
}
enum Compress {
  NONE = 0;
  DEFLATE = 1;
}
JohnLCaron commented 3 years ago

ok, so:

  1. Change the cdmr enums now, before ver6 is released. I think it works to be contained inside of a Message. I hope.
  2. ncstream and cdmrFeature are abandoned, and will be moved out of core in ver7.
  3. grib and bufr legacy indices need to be left for now, but can be changed if/when we update the indices. Looks like we got lucky and there are no conflicts.
JohnLCaron commented 3 years ago

@sarms said:

use (prefix, start with unspecified, screaming snake case, consecutive numbering) and map the values to AxisType values as needed in the GcdmGridConverter class, would that buy us any flexibility on the gCDM proto interface side, or the grid API side more generally speaking? I think it would make more sense to anyone trying to use those protos to write messages, for sure. It feels like the right thing to do, but it's a little more work. I am not sure I could see a case where I would anticipate a server (ours or another implementation) sending an unspecified axis type, but maybe?

Reference: https://stackoverflow.com/questions/27030332/solutions-to-resolve-enum-field-naming-restriction-in-google-protobuf-due-to-c

One can either contain the enum in a message, or use a unique prefix (as the style guide suggests).  

In https://github.com/Unidata/netcdf-java/pull/640/files, I chose the former, because it felt more natural. Theres no real readability problem, since all uses are contained inside CdmrGridConverter.java.

I agree the default zero value should not be a valid enum. I think that allows us to detect malformed protos, not because we might want to send a valid proto with that value.

I agree that using a unique prefix is preferred, mostly to make sure there are no effects in other languages we dont know about. So go ahead and convert.

Note that the CdmrGridConverter enum conversion routines will need to be hand coded, and cant depend on having the same numeric value. Probably also remove from being contained inside GridAxis, for simplicity.

lesserwhirls commented 3 years ago

That all make sense, and sounds good. I'll wrap that move up today. I'll take a stab a moving the enums out of the GridAxis message and see what that looks like. Let me get a working PR.

JohnLCaron commented 3 years ago

@lesserwhirls can we close?