Open lizmat opened 1 month ago
Case in point: use experimental :pack
doesn't even export unpack
, and nobody has noticed for at least 6 years?
% raku -e 'use experimental :pack; unpack(42)'
===SORRY!=== Error while compiling -e
Undeclared routine:
unpack used at line 1. Did you mean 'pack'?
The unpack
is provided by Buf.unpack
. Never been quite sure about that asymmetry.
The Net::AMQP is quite a heavy user of pack/unpack so will need a rework but that isn't a showstopper with sufficient warning.
But IIRC pack
was made experimental retrospectively so there may well be a lot of older modules using it which have just had the use experimental "pack"
slapped on them to keep them working.
https://gist.github.com/Whateverable/cb0ceefdb6724aa0181b4b5ea19b00d0
Yeah, "modules of a certain age" :smiley_cat:
The Net::AMQP is quite a heavy user of pack/unpack so will need a rework but that isn't a showstopper with sufficient warning.
I thought that P5pack
was a drop-in replacement. I guess the Buf.unpack
doesn't make it so. Guess I'll need to look at it again to make that work
FWIW some of the examples above :point_up: probably don't even need that. The Net::FTP
, for example, has instances of:
$_.unpack("A*");
(and no other uses of the pack
functionality)
Which is basically:
$_.decode('ascii')
Or something.
The Net::DNS
has something on the lines of:
$inc-size = $client.read(2);
$inc-size = $inc-size.unpack('n');
$incoming = $client.read($inc-size);
Which is basically :
$inc-size = $client.read(2);
$incoming = $client.read($inc-size.read-uint16);
Which makes one wonder whether $client
's class should have a read-uint16
(and friends) methods.
P5pack
0.0.15 is now drop-in compatible with use experimental :rakuast
.
As an additional comment: P5pack should probably get a complete makeover building code with RakuAST, similar to printf
.
In my experience (implementing binary protocols), this general problem area is not well solved right now. read-uint16
and friends are clunky, but pack
and friends are too Perlish and not nearly Rakuish. It's kind of begging for a more comprehensive module (I have some ideas, but they're not developed enough to start writing such a module).
@Leont Could P5pack serve as an "nqp" in your opinion for such a module?
@Leont
(implementing binary protocols) ...
What did/do you make of @alabamenhu's Binex?
(Iirc they paused their work on it until RakuAST had sufficiently matured and Rakoons were sufficiently interested in there being a better solution in this space. Or perhaps they came up with another solution; if so I guess it would be good to hear from them what that was.)
@Leont Could P5pack serve as an "nqp" in your opinion for such a module?
No, I think that's wiring things exactly the wrong way around. Though I do think RakuAST is the right way to go about these things.
What did/do you make of @alabamenhu's Binex?
I doesn't look like it's a good solution for my problem. Binary formats don't generally involve things like backtracking, I don't think regexes are quite the paradigm.
What I really want that pack doesn't offer includes:
Ideally it would also be usable with streams instead of bufs, but that may be a little too ambitious for now.
@Leont feels to me you're advocating a superset of the functionality pack
/unpack
provide, with a more object oriented syntax for specifying the format?
usable with streams instead of bufs
Are you thinking along the lines of: a format eating N bytes from a stream, then N again, and exposing that as a Supply ?
Personally, I have always felt pack/unpack to be a bit of an anachronism in a language with (native) types. When I needed something to inspect MoarVM's bytecode files I therefore just wrote this little ByteReader that inspects struct like types to figure out what to read: https://gist.github.com/niner/63a718023aba72e0dffc39c1ccd84e32
It's very little code that lets me declare structs as simple classes with natively typed members and let's me read those structs via e.g. $reader.read-struct(Header);
. The two major features this is missing for a generically useful binary format parser are support for variable length fields and support for type/kind fields, i.e. data that tells you what exact struct to expect. I guess inheritance and default values or the like would easily take care of the latter.
Also MoarVM::Bytecode
@Leont feels to me you're advocating a superset of the functionality pack/unpack provide, with a more object oriented syntax for specifying the format?
Yes, possibly something of a high level AST even (I mean, the problem involves both loops,conditionals and state). Something like pack
could easily be implemented on top of that for people who want a simple interface for simple problems.
Are you thinking along the lines of: a format eating N bytes from a stream, then N again, and exposing that as a Supply ?
Yeah something like that, except it wouldn't quite be a fixed number of bytes.
a format eating N bytes from a stream
A format eating the bytes from a stream that it needs, and again, exposing that as a Supply
:-)
I don't think regexes are quite the paradigm
Yep. This was part of the conversation when pack
was made experimental in the first place and it always seemed like it was a "when all you have to hand is a hammer, everything looks a nail" sort of thing. Or more accurately a Swiss Army Knife where you have a bunch of tools which plausibly could cover all the bases but in practice turn out to be useful only if you're prepared to use them inappropriately.
Part of the problem is that a lot of the binary data that is in the wild is either in a format that seemed natural in the language that it was originally implemented in (probably extinct,) or seemed natural for the application and was specified in a way that could be easily implemented in any relatively low level language available at the time without the kind of abstractions we are used to. The former includes stuff like COBOL PIC(X)
formatted data or the "dump a C struct" like utmp
which is quite easy to deal with by pack
or the read_foo
methods, the latter maybe not so much.
or the "dump a C struct" like utmp which is quite easy to deal with by pack or the read_foo methods
Yeah, pack
is generally pretty good at that iff you know the exact format. IME binary protocols tend to be more complicated. Conditionals are common (e.g. mqtt or diameter), and tend to have a bunch of special cases (e.g. in Postgres a length marker of -1 means the equivalent of an undefined value).
a more object oriented syntax for specifying the format
In the simple (static,) case this could work well with, say, some role to allow for construction from a Blob and some traits to specify the translation to Raku typed attributes. But in the dynamic case where the structure of subsequent data is dependent on earlier values then we're going into DSL territory rather than something that can be simply declarative.
A relatively simple example of the more complex case might be a dBase .dbf
file where the header part might specify the number of fields, followed by that number of fixed length field descriptions, then followed by the data of the actual rows who's length and structure is determined by the previous field descriptions.
What did/do you make of @alabamenhu's Binex?
I doesn't look like it's a good solution for my problem. Binary formats don't generally involve things like backtracking, I don't think regexes are quite the paradigm.
What I really want that pack doesn't offer includes:
* Mapping high level types to low level ones. Often I don't want to deal with primitives, I want to deal with higher level types (enums in particular are common). * Support optional/dynamic fields ("if bit X is set, we expect a integer here, otherwise we don't") * Thorough support for bitfields (including items taking up multiple bits). * More sensible encoding support
Ideally it would also be usable with streams instead of bufs, but that may be a little too ambitious for now.
So when I'm done, you should be able to do most of this stuff. I didn't really imagine Binex with backtracking in mind -- when I use grammars, I'm almost always using tokens as most common formats are generally designed carefully to avoid needing any kind of backtracking. But I did imagine being able to do basically everything you've pointed up there except for encoding (which I figured would just pass a blob to a decode during the action phase). As raiph pointed out, though, I did hold off on developing it further. I think RakuAST is sufficiently mature I could get back to it, though.
But it doesn't really create an equivalent pack equivalent, so it probably wouldn't fit your use case.
The P5pack module covers most of the functionality needed.
And the
use experimental :pack
functionality has bugs: https://github.com/rakudo/rakudo/issues/1875