Closed itamarla closed 6 years ago
The short answer is that when this project was developed, AWS Lambda didn't actually support Python3. That's a fairly recent release (https://aws.amazon.com/about-aws/whats-new/2017/04/aws-lambda-supports-python-3-6/) and you're the first person to ask about it.
We'll work on getting support for that in.
Any updates on adding support?
No, sorry @jessmchung , had lost track of this. I'll try to start looking into it in the next week or so. What level of urgency do you have?
Definitely within the next two months pls!
@jessmchung my current plan is to add concurrent Python 2.7.x and Python 3.x support to the existing Python code by adding a dependency on the six library (https://pypi.python.org/pypi/six) as well as get some unit tests added in. Does that sound reasonable?
@brentnash That sounds excellent!
Ongoing work happening here:
https://github.com/awslabs/kinesis-aggregation/tree/py3-compat
First round of updates is complete. There's a working unit test that runs an end-to-end aggregate & deaggregate successfully on both Python 2.7.x and Python 3.6.x. (Have not merged back to master yet)
I'm working on adding some more unit testing and will try to do some end-to-end testing.
I will probably bump the version number to 1.1 when I push to PyPi.
I really appreciate Py3 support.
Tested py3-compat
branch and worked like a charm except for one.
For some py3 environments, pip
is named pip3
(or the like). In that case, make_lambda_build.py
fails with 'You do not have "pip" installed or it is not on your PATH.` error.
For example, on upcoming AL2, if you install py3 with $ sudo amazon-linux-extras install python3
, pips are installed as follows:
$ ls /usr/bin/ | grep pip
pip3
pip-3
pip-3.6
pip3.6
Thank you for taking a look at it @quiver ! Much appreciated.
Nice catch on the pip3 thing.
I checked in a change to invoke the pip
module directly which I think should get around the issue. i.e.:
python -m pip install -r requirements.txt
Give it a shot if you get a chance. I tested with python 2 & python 3 on my system successfully.
@brentnash
python -m pip install -r requirements.txt
This hardcoded approach does not work if python
is linked to Py2 and its target runtime is Py3.
What about using sys.executable
instead?
This should work as long as make_lambda_build.py
is invoked from the corresponding interpreter.
$ git diff -u
diff --git a/python/make_lambda_build.py b/python/make_lambda_build.py
index 41aa05a..97a2857 100644
--- a/python/make_lambda_build.py
+++ b/python/make_lambda_build.py
@@ -97,7 +97,7 @@ def install_dependencies():
print('')
print('Installing necessary modules from pip...')
requirements_file = os.path.join(proj_dir, REQUIREMENTS_FILE_NAME)
- pip_install_cmd = 'python -m pip install -r {} -t "{}"'.format(requirements_file, build_dir)
+ pip_install_cmd = '{} -m pip install -r {} -t "{}"'.format(sys.executable, requirements_file, build_dir)
print(pip_install_cmd)
pip_install_cmd_line_result = subprocess.call(pip_install_cmd, shell=True)
if pip_install_cmd_line_result != 0:
Tested with
@quiver I forgot sys.executable existed. That is a much better answer. I'll add that instead.
@jessmchung I'm still going to add some more tests, but did you want to try out the py3-compat branch before I merge to master and upload to PyPi?
@brentnash Yes! Will let you know asap!
Not urgent, but if you have a chance, I always appreciate extra data/validation points. Thanks!
Whoops - meant to respond! We tested this. It looks good for our usecases.
Great! My Mon. & Tues. this week are madness, but I'll try to wrap things up, get some more test coverage and get this pushed to PyPi by the end of the week. Thanks!
This is working for us as well! Any update on when it will be released?
Apologies, got swamped. Likely early next week at this point.
Great to hear @brentnash looking forward to the Python 3 support!! Will it be available on pip?
That's the plan!
As an update to anyone watching, I'm working on this today, so I'll update with progress later on.
YAY!!!
Added a bunch more unit tests:
https://github.com/awslabs/kinesis-aggregation/commits/py3-compat
I generated a bunch of data with the actual Kinesis Producer Library (KPL) to compare against.
Working on merging to master now and then I'll start on the PyPi distribution.
Live on Test PyPi at the moment:
https://testpypi.python.org/pypi/aws_kinesis_agg
Running some last tests.
Alright, new version should be live:
https://pypi.python.org/pypi/aws_kinesis_agg/1.1.0
Note the new version number is v1.1.0 (was previously 1.0.1). I've done some basic verification via both Python 2.7.14 and Python 3.6.4. Give it a shot and let me know how it goes for all of you (@jessmchung @jarrellmark @jsullivanLI)
I'll leave this issue open until we get some independent confirmation. I may also make a few documentation updates as well.
Thanks for the port, @brentnash !
I had to make a slight modification to my code to get it working in python 3.6. I'm not sure if this is a bug just pointing it out. A sample is below demonstrating the change:
import json
import uuid
import boto3
import aws_kinesis_agg.aggregator
kinesis_client = None
def send_record(agg_record):
global kinesis_client
partition_key, explicit_hash_key, data = agg_record.get_contents()
print("partition_key: {partition_key}".format(partition_key=partition_key))
print("explicit_hash_key: {explicit_hash_key}".format(explicit_hash_key=explicit_hash_key))
print("data: {data}".format(data=data))
"""
The commented out code below worked in python 2.7
"""
# kinesis_client.put_record(
# StreamName='STREAM_NAME',
# Data=data,
# PartitionKey=partition_key,
# ExplicitHashKey=explicit_hash_key)
kinesis_client.put_record(
StreamName='STREAM_NAME',
Data=data,
PartitionKey=partition_key)
if __name__ == '__main__':
print(json.dumps({
'id': str(uuid.uuid4()),
'message': "Sample Message"}))
session = boto3.Session(profile_name='default')
kinesis_client = session.client('kinesis')
kinesis_agg = aws_kinesis_agg.aggregator.RecordAggregator()
kinesis_agg.on_record_complete(send_record)
for message_number in range(5):
partition_key = str(uuid.uuid4())
kinesis_agg.add_user_record(
partition_key=partition_key,
data=json.dumps({
'id': partition_key,
'message': "Message #{message_number}".format(message_number=message_number)}) \
.encode('utf-8'))
send_record(kinesis_agg.clear_and_get())
It looks like the older add_user_record method calculated its own explicit_hash_key value if None
was given.
@jarrellmark The good news is that this was indeed intentional, but definitely needs to be better documented in this project. Part of the reason I bumped the version from 1.0 to 1.1 was that I wanted to imply there might be some changes in functionality.
Let me try to run down the rationale briefly (in case you're interested)...
The Kinesis Producer Library (KPL) has code (https://github.com/awslabs/amazon-kinesis-producer/blob/master/aws/kinesis/core/user_record.cc#L22) that appears to generate an explicit hash key (EHK) from a partition key (PK) when one isn't present. That's what this project was doing previously (although the calculation was actually incorrect so I fixed that too). In reality though, I ran a bunch of tests with the actual KPL and if you give it a null EHK, it doesn't auto-generate one, it literally just doesn't put an EHK into the aggregated message and you re-receive it as null on the other end via the Kinesis Client Library (KCL) when the aggregated record is unpacked. So as far as I can tell, in normal circumstances, that code isn't actually used.
Furthermore, as describe in https://github.com/awslabs/kinesis-aggregation/issues/11, specifying your own EHKs can actually cause some unintended and unpleasant side effects if you're using the actual Kinesis Client Library (KCL). If you don't specify an EHK, it will just create one for itself by hashing the PK. You can find the concerning behavior here: https://github.com/awslabs/amazon-kinesis-client/blob/master/src/main/java/com/amazonaws/services/kinesis/clientlibrary/types/UserRecord.java#L243. The fallout is essentially that if any of your PKs or EHKs for your user records inside the aggregated record don't match the shard the aggregated record itself is sent to, the KCL will drop them. So in that sense, not sending an EHK on your behalf seems like a safer bet (in addition to being more in-line with the actual KPL). FWIW, I don't believe the Python deaggregation modules in this project have that issue, but it's worth knowing about.
Sorry, that was a lot more text than I thought it was going to be. Anyway, that's the rationale; if you disagree or have other ideas, I'm happy to hear them.
Sounds good, thanks for the explanation, @brentnash.
It's been a few week with no further comments, so I'm going to go ahead and resolve this. Feel free to reopen if there are any further issues.
Is there a reason for the python part not being available for python3.x? There is lambda support for python 3.6 yet we will not be able to use it if this lib is not updated.
Thanks, Itamar