awslabs / kinesis-aggregation

AWS libraries/modules for working with Kinesis aggregated record data
Apache License 2.0
376 stars 154 forks source link

kpl_pb2.py should be regenerated to avoid problems with upcoming protobuf release #159

Open akursar opened 2 years ago

akursar commented 2 years ago

Due to upcoming changes in python's protobuf, which can be previewed with pip install protobuf==4.21.0rc2, the bundled kpl_pb2.py should be regenerated with a newer version of protoc.

import aws_kinesis_agg.kpl_pb2
../../.venv/kpl/lib/python3.8/site-packages/aws_kinesis_agg/kpl_pb2.py:51: in <module>
    _descriptor.FieldDescriptor(
../../.venv/kpl/lib/python3.8/site-packages/google/protobuf/descriptor.py:560: in __new__
    _message.Message._CheckCalledFromGeneratedFile()
E   TypeError: Descriptors cannot not be created directly.
E   If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
E   If you cannot immediately regenerate your protos, some other possible workarounds are:
E    1. Downgrade the protobuf package to 3.20.x or lower.
E    2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).
E   
E   More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates

https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates

garybryan commented 2 years ago

This is now causing the error shown above, due to protobuf 4.21 having been released.

Since the protobuf version is not pinned to 3.x in aws-kinesis-agg, Pip is installing 4.21 for packages that have aws-kinesis-agg as a requirement.

While a version is set in requirements.txt for the aws-kinesis-agg build itself (https://github.com/awslabs/kinesis-aggregation/blob/ae6250befb5b06dc86af8decef3a7e4f09a5d2c7/python/requirements.txt), it's not set as a package dependency for other packages that require it (https://github.com/awslabs/kinesis-aggregation/blob/398fbd4b430d4bf590431b301d03cbbc94279cef/python/aws_kinesis_agg.egg-info/requires.txt) so Pip just fetches the newest version.

kbl_pb2.py should be regenerated, but it would also be good practice to pin the protobuf version to at least a major version in the dependencies.

IanMeyers commented 2 years ago

I'm not sure I understand entirely what the issue is. The protobuf version is pinned to 3.15.0 in requirements.txt. I tsounds like the issue is that if you do a system wide install of protobuf, you will get 4.21, which then when you try to run this version will not work. This would be expected, and you would need to downgrade protobuf to a version before 4.21 to get compatibility. Is that correct?

Given this, I could regenerate the kpl_pb2 file, but the structure within later versions of kinesis producer works differently, and from my first glance are incompatible with the existing types and structures. Therefore a regenerate isn't a simple fix, and so looking to see if we can get you unblocked faster by downgrading protobuf.

IanMeyers commented 2 years ago

I understand the issue now, and have pushed version 1.2.0 in 3db23db that addresses this issue. Please let me know if this works for you.

akursar commented 2 years ago

This is not working for me because of this line: https://github.com/awslabs/kinesis-aggregation/blob/3db23dbb9376cc1f5e39a3c6d4e04c249ddd3a20/python/aws_kinesis_agg/messages_pb2.py#L14

This seems to be the only import which refers to an internal module without referring to the aws_kinesis_agg package. If I patch this in my installation:

sed -i -e 's/import config_pb2 as config__pb2/import aws_kinesis_agg.config_pb2 as config__pb2/' /pkg/aws_kinesis_agg/messages_pb2.py

I'm then able to import. Admittedly though, this seems to be because I'm packaging this up in a lambda and I'm going out of my way to strip the zipfile to be pyc-only to save space. I'm not sure why running with pyc-only would cause an import error here, while using the *.py files seems to work without a problem, but I see there were additional changes to add paths to sys.path so curious if @IanMeyers already bumped into some import/path issues while making this update?

IanMeyers commented 2 years ago

Yes, sort of. This issue appears to be related to https://github.com/protocolbuffers/protobuf/issues/1491. I am working on a fix in version 1.2.1 for you that should also be compliant with compiling lambda packages, but will require including the project bast directory to include the protobuf generated libs.

IanMeyers commented 2 years ago

Pushed 1.2.1 - can you please try it?

Dilski commented 2 years ago

@IanMeyers I'm seeing a "No module named 'messages_pb2'" with 1.2.1

IanMeyers commented 2 years ago

Can you share a stack trace please?

IanMeyers commented 2 years ago

Think I see the issue - standby please.

IanMeyers commented 2 years ago

Issue should be fixed in version 1.2.2. Package builder was ignoring the root path. Sorry folks.

Dilski commented 2 years ago

Thanks for the incredibly speedy response!