Closed Andrei15193 closed 5 years ago
I have checked the REST API for binary data and the response uses Base64 strings as well. It's something with the command line tool.
@Andrei15193 what error are you seeing?
@williexu here's the full error message I get:
'utf-8' codec can't decode byte 0x80 in position 128: invalid start byte
Traceback (most recent call last):
File "C:\Users\VSSADM~1\AppData\Local\Temp\pip-install-e5ce5s7r\knack\knack\cli.py", line 212, in invoke
File "C:\Users\VSSADM~1\AppData\Local\Temp\pip-install-e5ce5s7r\knack\knack\output.py", line 132, in out
File "C:\Users\VSSADM~1\AppData\Local\Temp\pip-install-e5ce5s7r\knack\knack\output.py", line 40, in format_json
File "C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\lib\json\__init__.py", line 238, in dumps
**kw).encode(obj)
File "C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\lib\json\encoder.py", line 201, in encode
chunks = list(chunks)
File "C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\lib\json\encoder.py", line 430, in _iterencode
yield from _iterencode_dict(o, _current_indent_level)
File "C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\lib\json\encoder.py", line 404, in _iterencode_dict
yield from chunks
File "C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\lib\json\encoder.py", line 325, in _iterencode_list
yield from chunks
File "C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\lib\json\encoder.py", line 404, in _iterencode_dict
yield from chunks
File "C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\lib\json\encoder.py", line 404, in _iterencode_dict
yield from chunks
File "C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\lib\json\encoder.py", line 437, in _iterencode
o = _default(o)
File "C:\Users\VSSADM~1\AppData\Local\Temp\pip-install-e5ce5s7r\knack\knack\output.py", line 31, in default
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 128: invalid start byte
@Andrei15193 I was able to repro your error.
@zezha-msft the binary info returned using the script above can't be decoded using utf-8. From what I've tried, some base64 strings can be decoded properly, while others cannot. Is this behavior expected and if so, how should the data be consumed?
@williexu I do not maintain the table SDK.
Adding @srinathnarayanan to help you with this issue.
@srinathnarayanan ping
Hey @williexu, @srinathnarayanan, are there any updates on this?
@williexu looks like there's not much focus on this, I figure you guys are busy with other stuff as well, it's quite an extensive SDK. Is there a guideline for contributing? I may be able to find the source of the issue and fix it, but for that I need a confirmation whether the result for binary data should indeed be a Base64 string, if not then what should be expected?
@Andrei15193, since the entity insert for a base64 string succeeds, I think the response should indeed also be a base64 string as well.
The table sdk is not giving back data in the expected format, and the CLI is unable to decode it properly. I've tried to get ahold of the sdk owner @srinathnarayanan but I believe he is on vacation at the moment. If you are able to figure out the source of the issue and open a PR on the right repo, that would be great.
@williexu I have done some digging and found the source(s) of the bug. There are multiple areas where we can intervene to fix this bug.
Starting from the response from the server, the request goes through azure-multiapi-storage-python, more specifically at _deserialization.py:117 an entity is being desirialized, at _deserialization.py:142 the property types from OData are being set and later on a property converter is looked up at _deserialization.py:193. For binary and encrypted properties the convertor is defined at _deserialization.py:69 which ultimately leads to _common_conversion.py:87 which explicitly converts Base64 encoded stings to byte strings which contain the actual raw binary data. This is one area where I can make a fix, but it might have effects in other areas where the library is being used.
A second area where a fix can be made is in knack, when an object is being written to the output it ultimately gets to output.py:113. In case of the CLI the default JSON formatter is being used, byte strings are treated separately at output.py:30 and judging by the existing tests test_output.py:61 it is expected for the byte string to be encoded in Base64 and not have the raw binary data. A change here may have effects for other tools or libraries.
A third place where I can make a fix is in the CLI itself, we can override the JSON formatters at _output.py:12 but this will mean mostly a copy/paste solution (copy from knack and change the bit that causes the bug, i.e.: encode back to Base64 string).
There might be other possible solutions, but I think these illustrate the nature of the issue. From the first and second findings it looks like it is intended behaviour and there's a discrepancy between what is expected and what is actually provided.
Where do you think the fix should be done?
@Andrei15193 can you do this fix through a transformer? i.e. example.
This will transform the result from the sdk before it arrives at the knack out()
method, and will not have any effects on the rest of the commands.
@williexu yep, I think it's possible. I'll have to check the --query
option as well. Not sure when it is applied, if it's after the transformer it's good, otherwise the issue may still happen if you change the structure of the response. I'll check and let you know.
Describe the bug When querying entities that have binary data properties (Edm.Binary type) the command line interface crashes for some values.
To Reproduce
Expected behavior The command is expected to run and the data to be returned encoded in Base64 for binary properties. The reason I expect Base64 is because I need to pass the value in the same encoding when inserting the entity.
Environment summary Install method: MSI (for Windows) CLI version: azure-cli (2.0.52) OS: Windows 10 Shell Type: PowerShell
Additional context Below is the script I used to test how binary data is stored (the documentation is not clear, I guessed it required Base64 from the JSON payload sample on Payload Format for Table Service Operations - JSON Format)