Open qsctr opened 2 years ago
The hgmp
library provides Haskell bindings to some low-level Integer
manipulation primitives from gmp
, which may offer an alternative way to search for a particular byte in an Integer
. In particular, mpz_import
and/or mpz_export
might be useful.
In #346 the implementation of
:readByteArray
was changed to turn the input file into a huge number and bindit
tosplit
applied to the huge number, to reduce memory usage compared to building up a huge sequence literal.https://github.com/GaloisInc/cryptol/blob/39dea16096539cb7e90b969c571f42044af29da4/src/Cryptol/REPL/Command.hs#L1046-L1061
However, using the resulting sequence turns out to be very slow when the file is big, because the result of
split
is not a whole new sequence in memory but rather a kind of lazy sequence that goes back to the original sequence and looks up the appropriate value whenever you index into it. And since the original sequence in this case is a huge bitvector represented as anInteger
, the lookup of a byte within it results in doing a shift as pointed out by @yav, which is linear in the size of the integer.https://github.com/GaloisInc/cryptol/blob/39dea16096539cb7e90b969c571f42044af29da4/src/Cryptol/Eval/Generic.hs#L1072-L1084
Therefore processing the whole result of
:readByteArray
takes quadratic time. Note that fully forcing the result ofsplit
does not help, because the result of the lookup is not cached, so it will be slow every time you use the sequence.It seems to me that the best way of solving this would be to just construct the sequence value directly in memory in a form where lookup is constant time and add a way to bind
it
to values instead of just expressions. Alternatively we could optimize the lookup of a byte within the huge bitvector to be constant time, perhaps by using a series oftestBit
calls as suggested by @RyanGlScott. This could improve performance more generally any time we are looking up a small number of bits in a very big bitvector, although I'm not sure how often that occurs outside of:readByteArray
.