Add `binary:encode_hex/1` and `binary:decode_hex/1` to estdlib

aiotter commented 3 days ago

I'm an engineer working at @realglobe-Inc. We use AtomVM to develop Wi-Fi sniffer on ESP-32.

We need to get a fast implementation for dumping Wi-Fi frame packet into hexadecimal string. So we made Nifs to dump and load hexadecimal string, which is very much similar to binary:encode_hex/1 and binary:decode_hex/1. I think I can add small codes and fit them into src/libAtomVM/nifs.c with a little effort. Do I have a chance to get it merged if I create a PR?

Maybe we can create Base.encode16 and Base.decode16 as well. Also Base.encode64 and Base.decode64 with already implemented base64:encode and base64:decode.

pguyot commented 1 day ago

Hello @aiotter

Thank you for your interest in AtomVM. I will let @bettio chime in, but I believe an implementation of binary:encode_hex/1 and binary:decode_hex/1 (and maybe binary:encode_hex/2) would be welcome, especially if it matches our contribution rules (licensing, coding style), and it would likely be approved it if it comes with docs and specs in binary.erl module and tests that also pass with BEAM.

I guess that a nif would be more memory efficient than a binary comprehension, but the following seems to work with AtomVM. We certainly don't need something as complex as Erlang/OTP's implementation.

encode_hex(B) ->
    << << (hd(integer_to_list(X, 16))):8 >> || <<X:4>> <= B >>.

bettio commented 12 hours ago

Indeed, we are looking forward to any contribution about this.

aiotter commented 9 hours ago

@pguyot @bettio Thank you for your reply!

I was shocked to see your beautifully simple code. I implemented the function in Elixir firstly, but my implementation is not elegant as yours. Actually I'm not used to Erlang or Elixir very much. Just like half a year using them.

After I created this issue, I took a research and found that the reason why my code being too slow is because I print every parsed result, not because of my Elixir implementation. So now I think Nifs are not needed for this kind of trivial tasks.

This is my implementation in Elixir, which is dump(binary()) :: charlist(), if you are interested.

```elixir defmodule Hexadecimal do def dump(<>), do: Integer.to_charlist(byte, 16) |> Utils.charlist_pad_leading(2, ?0) def dump(<>), do: dump(byte) ++ dump(rest) end defmodule Utils do # :string.pad def charlist_pad_leading(charlist, length, char) when is_list(charlist) and length > length(charlist) do :lists.duplicate(length - length(charlist), char) ++ charlist def charlist_pad_leading(charlist, length, _char) when is_list(charlist) and length == length(charlist), do: charlist end end ``` I use charlist instead of binary because adding at the head of the list is efficient.

And this is my Nif.

```c #include #include #include "atom.h" #include "term.h" #include "defaultatoms.h" #include "nifs.h" #include "esp32_sys.h" // #define ENABLE_TRACE #include "trace.h" static term nif_hexadecimal_encode(Context *ctx, int argc, term argv[]); static term nif_hexadecimal_decode(Context *ctx, int argc, term argv[]); const struct Nif *utils_nif_get_nif(const char *nifname); char int_to_char(uint8_t i); static term nif_hexadecimal_encode(Context *ctx, int argc, term argv[]) { TRACE("nif_hexadecimal_encode\n"); UNUSED(argc); term src = argv[0]; if (!term_is_binary(src)) { RAISE_ERROR(BADARG_ATOM); } unsigned long size = term_binary_size(src) * 2; uint8_t *data = (uint8_t *)term_binary_data(src); size_t needed = term_binary_data_size_in_terms(size) + BINARY_HEADER_SIZE; if (UNLIKELY(memory_ensure_free_with_roots(ctx, needed, 1, argv, MEMORY_CAN_SHRINK) != MEMORY_GC_OK)) { RAISE_ERROR(OUT_OF_MEMORY_ATOM); } term buffer_term = term_create_uninitialized_binary(size, &ctx->heap, ctx->global); char *buffer = (char *)term_binary_data(buffer_term); for (unsigned long i=0; i> 4; buffer[2*i] = int_to_char(higher); uint8_t lower = data[i] & 0b00001111; buffer[2*i+1] = int_to_char(lower); } TRACE("Encoded into %lu bytes.\n", size); return buffer_term; } char int_to_char(uint8_t i) { if (i < 10) { return '0' + i; } else if (10 <= i && i < 16) { return 'A' + i - 10; } else { return 0; } } static term nif_hexadecimal_decode(Context *ctx, int argc, term argv[]) { TRACE("nif_hexadecimal_decode\n"); UNUSED(argc); term src = argv[0]; if (!term_is_binary(src)) { RAISE_ERROR(BADARG_ATOM); } unsigned long original_size = term_binary_size(src); char *data = (char *)term_binary_data(src); unsigned long size = original_size/2 + original_size%2; size_t needed = term_binary_data_size_in_terms(size) + BINARY_HEADER_SIZE; if (UNLIKELY(memory_ensure_free_with_roots(ctx, needed, 1, argv, MEMORY_CAN_SHRINK) != MEMORY_GC_OK)) { RAISE_ERROR(OUT_OF_MEMORY_ATOM); } term buffer_term = term_create_uninitialized_binary(size, &ctx->heap, ctx->global); uint8_t *buffer = (uint8_t *)term_binary_data(buffer_term); for (unsigned long read_bytes=0; read_bytes<(size); read_bytes++) { char buffer_c[3] = {0}; for (int i=0; i<2 && read_bytes*2 + i < original_size; i++) { buffer_c[i] = data[read_bytes*2 + i]; } char *endptr; ((char *)buffer)[read_bytes] = strtol(buffer_c, &endptr, 16); if (UNLIKELY(buffer_c == endptr)) { TRACE("decode failed\n"); RAISE_ERROR(BADARG_ATOM); }; // fprintf(stderr, "%s -> %02x\n", buffer_c, ((char *)buffer)[read_bytes]); } return buffer_term; } static const struct Nif hexadecimal_encode_nif = { .base.type = NIFFunctionType, .nif_ptr = nif_hexadecimal_encode }; static const struct Nif hexadecimal_decode_nif = { .base.type = NIFFunctionType, .nif_ptr = nif_hexadecimal_decode }; const struct Nif *utils_nif_get_nif(const char *nifname) { if (strcmp("utils:encode_hex/1", nifname) == 0 || strcmp("Elixir.Utils:encode_hex/1", nifname) == 0) { TRACE("Resolved platform nif %s ...\n", nifname); return &hexadecimal_encode_nif; } else if (strcmp("utils:decode_hex/1", nifname) == 0 || strcmp("Elixir.Utils:decode_hex/1", nifname) == 0) { TRACE("Resolved platform nif %s ...\n", nifname); return &hexadecimal_decode_nif; } return NULL; } REGISTER_NIF_COLLECTION(utils, NULL, NULL, utils_nif_get_nif) ```

Anyway, I'm willing to submit a PR which implements binary:encode_hex/1, binary:decode_hex/1 and binary:encode_hex/2 in Erlang after my current task is completed. They must be helpful for people like us when dumping some data from microcontrollers to send them into Wireshark.

There are lots of lacking functions like string:pad (String.pad_leading) and ~~binary:copy~~ (implemented at dfe2003ad1e42ea2d23adf944e55e0c926e72685). I needed to implement them by myself. Do you also accept PRs for implementing them? I want to know what kind of functions are needed in eavmlib and exavmlib, if there are something like general rules. I know you don't want every function in OTP.

bettio commented 4 hours ago

Thank you very much.

We are glad to receive any kind of contribution, including string:pad or anything else. There will be some comments and feedback before merging a pull request in order to make sure the contribution matches our contribution rules, such as tests coding style and etc... So if it is ok for you to go through the review process, please open a pull request and we'll do our best to review it.

atomvm / AtomVM

Add `binary:encode_hex/1` and `binary:decode_hex/1` to estdlib #1287