Unidata / thredds

THREDDS Data Server v4.6
https://www.unidata.ucar.edu/software/tds/v4.6/index.html
265 stars 179 forks source link

Upgrade protobuf-based indexes to proto3. #306

Closed JohnLCaron closed 8 years ago

JohnLCaron commented 9 years ago

Turns out its possible to make indexes backwards compatible (5.0 can read old indexes) but not forward compatible (<5.0 cannot read new indexes). This is because default values are not stored in proto3, so proto2 barfs when "required" fields are missing.

i think we can modify proto2 to accept new indexes by changing eg

required fixed32 cdmHash = 1; 

to

optional fixed32 cdmHash = 1 [default = 0];

this would fix new versions using proto2, but old version would still break.

Can we expect people to upgrade to 5.0 and not run 4.6 simultaneously? or if they need to do both, to upgrade to latest 4.6 ?

there are downsides to bumping the gbx/ncx suffix, particularly if you want to continue to allow the old ones in 5.0. a clean break is easy enough. still, im thinking of our installs (NCDC, RDA) with a terabyte of index files.

the other option is to bag the proto3 and revert to proto2. we did it for the earth engine group who needs to use proto3. plus, it has some improvements and will be used heavily in android (i understand).

im leaning towards

  1. keeping proto3.
  2. keeping gbx9 files compatible, and upgrading 4.6.x to read both versions.
  3. bump to ncx4, with incompatible version; there are some improvements we can make there, i think.

the main downside is that older versions will barf on new gbx9 files. could also bump gbx9 (gbx10? gbxa? gbx?), but see about accepting both in 5.0.

JohnLCaron commented 9 years ago

Ive discovered that

1) its possible to be forwards compatible, by modifying the proto2 description as above. this allows proto2 processor to tolerate missing values.

2) its possible for proto3 library to generate proto2 messages. Im not sure if i realized this and decided against it for a reason TBD, or if its continuing confirmation of advancing senility.

So im going to try keeping gbx9 in proto2. This solves any compatibility problems.

Then think if ncx should do the same or increment and take advantage of proto3 (ie leave ncx3 in proto2 and ncx4 in proto3).

JohnLCaron commented 9 years ago

Im seeing some improvements to be made in ncx:

Maintain seperate GDS. see #309.

adding packing = true will save some space.

lesserwhirls commented 9 years ago

@JohnLCaron - I like the idea of keeping the .gbx9 in proto2, especially given the heavy lifting it takes to recreate those, unless there are some really nice advantages to moving those to proto3.

Also, keeping ncx3 in proto2 and ncx4 in proto3 seems good - would it be possible to allow 5.0 to read both ncx3 and ncx4? If so, then anyone testing the waters of upgrading from 4.6.x to 5.0 would have a smoother upgrade path. Perhaps we could add a flag to the TDM which would allow TDM to upgrade ncx3 to ncx4 once everything appears to be ok? Maybe like

-DncxVersion=option

where option is:

3 = write out ncx3 4 = write out ncx4, upgrade ncx3 to ncx4 if found (default)

option=3 would be useful for testing upgrades, option=4 would be good for pulling the trigger on the upgrade.

Just thinking out loud here, and doing so with a lack of coffee.

JohnLCaron commented 9 years ago

Would be a lot of work to keep both ncx4 and ncx3 options, becuse some of the ncx4 changes penetrate other parts of the code. at the moment, im working on the ncx4 branch. will see how hard it looks when thats done.

defo keep gbx9 in proto2.

JohnLCaron commented 9 years ago

ratio ncx3/ncx4 on cdmUnitTest files

606/472 KB = 1.28 (grib1) 467/303 KB = 1.4. (grib2)

So 22-35% smaller. possible due to packing option. not that these file sizes are much of a problem. gbx9 = 3.17 MB