Azure / azure-sdk-for-python

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.
MIT License
4.55k stars 2.77k forks source link

Better int64 support #24499

Open lovettchris opened 2 years ago

lovettchris commented 2 years ago

Is your feature request related to a problem? Please describe.

Python integers are variable sized, for example:

>>> import sys
>>> sys.getsizeof(int(19283612873512536))
32
>>> sys.getsizeof(int(192836128735125361239862))
36
>>> sys.getsizeof(int(1928361287351253612398621293876))
40

but the current azure.data.tables entity serialization maps all python integers to int32 on Azure, which is not very friendly because you are bound to hit a nasty error at runtime when your python integer overflows int32. Then you have to monkey around the documentation and if you are lucky you discover EntityProperty(memory, EdmType.INT64) and change all your code to handle that properly and unwrap it on deserialization, etc. which is a big mess.

Describe the solution you'd like It would be cool if this auto-conversion to int64 (or even bigger) and back to python int was completely transparent and handled in the serialization layer.

Describe alternatives you've considered I am now using EntityProperty(memory, EdmType.INT64) but I'm not happy with it. This topic should be dealt with on page 1 of the python SDK, pointing out "int" in python is special. I don't know if you do this already, but you could also add special handling for numpy int64 datatype.

Additional context N/A

azure-sdk commented 2 years ago

Label prediction was below confidence level 0.6 for Model:ServiceLabels: 'Tables:0.14738512,Service Bus:0.13976018,Azure.Core:0.11805622'

annatisch commented 2 years ago

Thanks for the feedback @lovettchris This is definitely an interesting question. The reason this design was originally chosen - i.e. to always treat Python integers as int32, with int64 requiring special handling, is for full compatibility with other languages. In Python, if we were to dynamically select the int32/int64 type depending on the input, this could crash a different client that was processing that table in a non-dynamically typed language.

However I agree that in a pure Python world, it's unfriendly. I will try to look at some alternative approaches we could use to make this more intuitive. I will definitely make some docs updates to call this out more explicitly. I can also take a look at the numpy int64 format to see if we can ensure support for it :)

lovettchris commented 2 years ago

Cool, thanks for the quick reply. Yeah, numpy is so much of a defacto standard in the python world it makes sense to handle its datatypes in a first class way.

annatisch commented 2 years ago

Thanks - this is a great recommendation! I will go through the numpy data types and start with the ones that have obvious equivalents and we can expand from there :)

YalinLi0312 commented 1 year ago

@lovettchris , thanks for your waiting! We are working on re-factoring the de/serialization of TableEntity type in this semester, we'll take your feedback into consideration. Stay tuned!

github-actions[bot] commented 4 months ago

Hi @lovettchris, we deeply appreciate your input into this project. Regrettably, this issue has remained unresolved for over 2 years and inactive for 30 days, leading us to the decision to close it. We've implemented this policy to maintain the relevance of our issue queue and facilitate easier navigation for new contributors. If you still believe this topic requires attention, please feel free to create a new issue, referencing this one. Thank you for your understanding and ongoing support.