cloudera / hue

Open source SQL Query Assistant service for Databases/Warehouses
https://cloudera.com
Apache License 2.0
1.13k stars 363 forks source link

Display and accept binary HBase data in escaped form #3729

Closed stoty closed 1 week ago

stoty commented 1 month ago

Description

Much of the time data in HBase (not just values, but often rowkeys and even the column qualifier) is binary, with the encoding determined by an application.

Currently Hue is incapable of displaying these in a usable manner. It tries to interpret the data as an (UTF-8 ?) string, and the ouput is full of placeholders for unprintable characters. (at least I was not able to change this from the UI)

Due to the lack of standard encodings, and metadata on the encoding, it is not possible to display the decoded contents in a reliable manner.

This same problem is solved in HBase shell by escaping binary data. Bytes that are printable ASCII characters are displayed as their ASCII character value, while bytes outside this range are displayes as escaped hex codes.

While this is still not a super-user friendly format, most HBase users are familiar with it, and have workflows to handle it.

I propose doing the same in Hue, using the same encoding to display all data (that is not otherwise identfied and handled).

Additionally, this encoding could also be supported in the editor, by accepting an escaped string and converting to its binary representation.

The escaping code is very simple, these are the java methods for escaping / unescaping:

https://github.com/apache/hbase/blob/156e430dc56211c0aea15d792e8733b1b0e3de5c/hbase-common/src/main/java/org/apache/hadoop/hbase/util/Bytes.java#L574 https://github.com/apache/hbase/blob/156e430dc56211c0aea15d792e8733b1b0e3de5c/hbase-common/src/main/java/org/apache/hadoop/hbase/util/Bytes.java#L607

There are further possible enhancements like being able to interpret the data as hex strings, or being able to switch the data encoding for cells/rows etc dynamicall to one of the standard encodings in org.apache.hadoop.hbase.util.Bytes, but those are less critical, and can be handled separately.

bjornalm commented 1 month ago

Hi @stoty and thanks for reaching out. Can you provide an screenshot example of how this looks in HBase for a clearer picture?

stoty commented 1 month ago

Creating and displaying binary data in hbase shell:

hbase:006:0>create 'demo', 'cf1'; Created table demo Took 0.8557 seconds
=> Hbase::Table - demo hbase:007:0>put 'demo', 'ascii_key', 'cf1:ascii_qialifier', 'ascii_value'; Took 1.2427 seconds hbase:010:0> put 'demo', "binary_key\x00\x01\xff", "cf1:binary_qualifier\x00\x01\xff", "binary_value\x00\x01\xff"; Took 0.0089 seconds hbase:019:0> scan 'demo' ROW COLUMN+CELL
ascii_key column=cf1:ascii_qialifier, timestamp=2024-05-08T11:40:29.760, value=ascii_value
binary_key\x00\x01\xFF column=cf1:binary_qualifier\x00\x01\xFF, timestamp=2024-05-08T11:42:55.064, value=binary_value\x00\x01\xFF
2 row(s) Took 0.0112 seconds

As you can see, the binary values are entered as escaped hex characters, and the results are displayed the same way.

The same data in Hue looks like this:

Screenshot from 2024-05-08 13-52-20

The easiest and most HBase-like solution would be using the same hex escaped format in Hue.

This could be a toggle in the toolbar for backward compatibility.

stoty commented 1 month ago

For ease of identification, I have used an ascii prefix, but the data is often pure binary, like a long or an integer.

bjornalm commented 1 month ago

Thanks, let's leave this issue open to see if any one in the community can create a PR for it.

stoty commented 1 month ago

77 is a similar issue.

github-actions[bot] commented 3 weeks ago

This issue is stale because it has been open 30 days with no activity and is not labeled "Prevent stale". Remove "stale" label or comment or this will be closed in 10 days.

stoty commented 1 week ago

Fot the record this is NOT completed.