gleam-lang / stdlib

🎁 Gleam's standard library
https://hexdocs.pm/gleam_stdlib/
Apache License 2.0
485 stars 171 forks source link

iolist_size crashes on non-byte aligned bit strings #320

Open varnerac opened 2 years ago

varnerac commented 2 years ago
import gleeunit
import gleam/bit_builder

pub fn main() {
  gleeunit.main()
}

pub fn bit_string_size_test() {
  <<1:int-size(1)>>
  |> bit_builder.from_bit_string()
  |> bit_builder.byte_size()
}

crashes with

λ ~/bitstringsize/ master* gleam test
  Compiling gleam_stdlib
  Compiling gleeunit
  Compiling bitstringsize
   Compiled in 1.41s
    Running bitstringsize_test.main
F
Failures:

  1) bitstringsize_test:bit_string_size_test/0: module 'bitstringsize_test'
     Failure: badarg
     Stacktrace:
       erlang.iolist_size
       gleam@bit_builder.byte_size
     Output: 

Finished in 0.006 seconds
1 tests, 1 failures
lpil commented 2 years ago

Thank you

varnerac commented 2 years ago

How do we fix this? Do bit strings need to be byte-aligned? Erlang iolists are formed from byte-aligned binaries. Do we deprecate byte_size in bit_string and bit_builder and replace it with bit_size?

lpil commented 2 years ago

Great point. Do they always need to be byte aigned? Is there a function to get the bit size of an iolist?

varnerac commented 2 years ago

Iolists need to be byte-aligned according to their Erlang type specs. You can combine bit strings into lists. However, they will fail for byte_size, even if they add up to a byte-aligned value:

1> iolist_size([<<1:1>>, <<0:15>>]).            
** exception error: bad argument
     in function  iolist_size/1
        called as iolist_size([<<1:1>>,<<0,0:7>>])
        *** argument 1: not an iodata term

There is a bit_size function for bit strings, but you'd have to fold/accumulate it across bit strings in Gleam land.

The real problem will be networks and file I/O. I think both need byte boundaries. I don't think you can send/write a fraction of a byte. I feel like you'd be forced into size checking a bit string list before doing anything useful with it.

lpil commented 2 years ago

I think being byte aligned makes sense, but it has some uncomfortable API implications. If a bit string is added that is not byte aligned should it fail? That implies it should return a Result, which would make the API much more uncomfortable to use. Another option would be to automatically add padding.

I also wonder if the name BitBuilder is misleading if it's byte aligned.

varnerac commented 2 years ago

The problem is that we cannot tell if the bit string is byte-aligned at compile-time. Here's the option I've been thinking about:

sdancer commented 2 years ago
<<1:int-size(1)>>

forbidding expressions where the static size isn't multiple of 8 to compile would be a good start

sdancer commented 2 years ago

can someone showcase a use case of partial bitstrings?

19> A.
<<1:1>>
20> <<A, 1:8>>.         
** exception error: construction of binary failed
     in function  eval_bits:eval_exp_field/6 (eval_bits.erl, line 143)
        *** segment 1 of type 'integer': expected an integer but got: <<1:1>>
     in call from eval_bits:create_binary/2 (eval_bits.erl, line 77)
     in call from eval_bits:expr_grp/5 (eval_bits.erl, line 68)

they don't seem to work at the moment

varnerac commented 2 years ago

https://github.com/gleam-lang/gleam/issues/1591 is related

1> A=<<1:1>>.
* 1:4: syntax error before: '<'
1> A= <<1:1>>.
<<1:1>>
2> <<A, 1:8>>.         
** exception error: bad argument
     in function  eval_bits:eval_exp_field1/6 (eval_bits.erl, line 123)
     in call from eval_bits:create_binary/2 (eval_bits.erl, line 81)
     in call from eval_bits:expr_grp/4 (eval_bits.erl, line 72)
3> <<A/bitstring, 1:8>>. 
<<128,1:1>>
varnerac commented 2 years ago

I think the answer is to:

lpil commented 2 years ago

forbidding expressions where the static size isn't multiple of 8 to compile would be a good start

We will not do this, bit strings of any size are perfectly valid in BEAM languages.

The plan here is to make BitBuilders always byte aligned, as they are intended to be in Erlang. The name is misleading, we may want to change it to BinaryBuilder or ByteBuilder