Raku / problem-solving

🦋 Problem Solving, a repo for handling problems that require review, deliberation and possibly debate
Artistic License 2.0
70 stars 16 forks source link

`use experimental :pack` should be removed #442

Open lizmat opened 2 weeks ago

lizmat commented 2 weeks ago

The P5pack module covers most of the functionality needed.

And the use experimental :pack functionality has bugs: https://github.com/rakudo/rakudo/issues/1875

lizmat commented 2 weeks ago

Case in point: use experimental :pack doesn't even export unpack, and nobody has noticed for at least 6 years?

% raku -e 'use experimental :pack; unpack(42)'
===SORRY!=== Error while compiling -e
Undeclared routine:
    unpack used at line 1. Did you mean 'pack'?
jonathanstowe commented 2 weeks ago

The unpack is provided by Buf.unpack . Never been quite sure about that asymmetry.

The Net::AMQP is quite a heavy user of pack/unpack so will need a rework but that isn't a showstopper with sufficient warning.

But IIRC pack was made experimental retrospectively so there may well be a lot of older modules using it which have just had the use experimental "pack" slapped on them to keep them working.

coke commented 2 weeks ago

https://gist.github.com/Whateverable/cb0ceefdb6724aa0181b4b5ea19b00d0

jonathanstowe commented 2 weeks ago

https://gist.github.com/Whateverable/cb0ceefdb6724aa0181b4b5ea19b00d0

Yeah, "modules of a certain age" :smiley_cat:

lizmat commented 2 weeks ago

The Net::AMQP is quite a heavy user of pack/unpack so will need a rework but that isn't a showstopper with sufficient warning.

I thought that P5pack was a drop-in replacement. I guess the Buf.unpack doesn't make it so. Guess I'll need to look at it again to make that work

jonathanstowe commented 2 weeks ago

FWIW some of the examples above :point_up: probably don't even need that. The Net::FTP, for example, has instances of:

$_.unpack("A*");

(and no other uses of the pack functionality)

Which is basically:

$_.decode('ascii')

Or something.

jonathanstowe commented 2 weeks ago

The Net::DNS has something on the lines of:

            $inc-size = $client.read(2);
            $inc-size = $inc-size.unpack('n');
            $incoming = $client.read($inc-size);

Which is basically :

            $inc-size = $client.read(2);
            $incoming = $client.read($inc-size.read-uint16);
lizmat commented 2 weeks ago

Which makes one wonder whether $client's class should have a read-uint16 (and friends) methods.

lizmat commented 2 weeks ago

P5pack 0.0.15 is now drop-in compatible with use experimental :rakuast.

As an additional comment: P5pack should probably get a complete makeover building code with RakuAST, similar to printf.

Leont commented 1 week ago

In my experience (implementing binary protocols), this general problem area is not well solved right now. read-uint16 and friends are clunky, but pack and friends are too Perlish and not nearly Rakuish. It's kind of begging for a more comprehensive module (I have some ideas, but they're not developed enough to start writing such a module).

lizmat commented 1 week ago

@Leont Could P5pack serve as an "nqp" in your opinion for such a module?

raiph commented 1 week ago

@Leont

(implementing binary protocols) ...

What did/do you make of @alabamenhu's Binex?

(Iirc they paused their work on it until RakuAST had sufficiently matured and Rakoons were sufficiently interested in there being a better solution in this space. Or perhaps they came up with another solution; if so I guess it would be good to hear from them what that was.)

Leont commented 1 week ago

@Leont Could P5pack serve as an "nqp" in your opinion for such a module?

No, I think that's wiring things exactly the wrong way around. Though I do think RakuAST is the right way to go about these things.

Leont commented 1 week ago

What did/do you make of @alabamenhu's Binex?

I doesn't look like it's a good solution for my problem. Binary formats don't generally involve things like backtracking, I don't think regexes are quite the paradigm.

What I really want that pack doesn't offer includes:

Ideally it would also be usable with streams instead of bufs, but that may be a little too ambitious for now.

lizmat commented 1 week ago

@Leont feels to me you're advocating a superset of the functionality pack/unpack provide, with a more object oriented syntax for specifying the format?

lizmat commented 1 week ago

usable with streams instead of bufs

Are you thinking along the lines of: a format eating N bytes from a stream, then N again, and exposing that as a Supply ?

niner commented 1 week ago

Personally, I have always felt pack/unpack to be a bit of an anachronism in a language with (native) types. When I needed something to inspect MoarVM's bytecode files I therefore just wrote this little ByteReader that inspects struct like types to figure out what to read: https://gist.github.com/niner/63a718023aba72e0dffc39c1ccd84e32

It's very little code that lets me declare structs as simple classes with natively typed members and let's me read those structs via e.g. $reader.read-struct(Header);. The two major features this is missing for a generically useful binary format parser are support for variable length fields and support for type/kind fields, i.e. data that tells you what exact struct to expect. I guess inheritance and default values or the like would easily take care of the latter.

lizmat commented 1 week ago

Also MoarVM::Bytecode

Leont commented 1 week ago

@Leont feels to me you're advocating a superset of the functionality pack/unpack provide, with a more object oriented syntax for specifying the format?

Yes, possibly something of a high level AST even (I mean, the problem involves both loops,conditionals and state). Something like pack could easily be implemented on top of that for people who want a simple interface for simple problems.

Are you thinking along the lines of: a format eating N bytes from a stream, then N again, and exposing that as a Supply ?

Yeah something like that, except it wouldn't quite be a fixed number of bytes.

lizmat commented 1 week ago

a format eating N bytes from a stream

A format eating the bytes from a stream that it needs, and again, exposing that as a Supply

:-)

jonathanstowe commented 1 week ago

I don't think regexes are quite the paradigm

Yep. This was part of the conversation when pack was made experimental in the first place and it always seemed like it was a "when all you have to hand is a hammer, everything looks a nail" sort of thing. Or more accurately a Swiss Army Knife where you have a bunch of tools which plausibly could cover all the bases but in practice turn out to be useful only if you're prepared to use them inappropriately.

Part of the problem is that a lot of the binary data that is in the wild is either in a format that seemed natural in the language that it was originally implemented in (probably extinct,) or seemed natural for the application and was specified in a way that could be easily implemented in any relatively low level language available at the time without the kind of abstractions we are used to. The former includes stuff like COBOL PIC(X) formatted data or the "dump a C struct" like utmp which is quite easy to deal with by pack or the read_foo methods, the latter maybe not so much.

Leont commented 1 week ago

or the "dump a C struct" like utmp which is quite easy to deal with by pack or the read_foo methods

Yeah, pack is generally pretty good at that iff you know the exact format. IME binary protocols tend to be more complicated. Conditionals are common (e.g. mqtt or diameter), and tend to have a bunch of special cases (e.g. in Postgres a length marker of -1 means the equivalent of an undefined value).

jonathanstowe commented 1 week ago

a more object oriented syntax for specifying the format

In the simple (static,) case this could work well with, say, some role to allow for construction from a Blob and some traits to specify the translation to Raku typed attributes. But in the dynamic case where the structure of subsequent data is dependent on earlier values then we're going into DSL territory rather than something that can be simply declarative.

A relatively simple example of the more complex case might be a dBase .dbf file where the header part might specify the number of fields, followed by that number of fixed length field descriptions, then followed by the data of the actual rows who's length and structure is determined by the previous field descriptions.

alabamenhu commented 6 days ago

What did/do you make of @alabamenhu's Binex?

I doesn't look like it's a good solution for my problem. Binary formats don't generally involve things like backtracking, I don't think regexes are quite the paradigm.

What I really want that pack doesn't offer includes:

* Mapping high level types to low level ones. Often I don't want to deal with primitives, I want to deal with higher level types (enums in particular are common).

* Support optional/dynamic fields ("if bit X is set, we expect a integer here, otherwise we don't")

* Thorough support for bitfields (including items taking up multiple bits).

* More sensible encoding support

Ideally it would also be usable with streams instead of bufs, but that may be a little too ambitious for now.

So when I'm done, you should be able to do most of this stuff. I didn't really imagine Binex with backtracking in mind -- when I use grammars, I'm almost always using tokens as most common formats are generally designed carefully to avoid needing any kind of backtracking. But I did imagine being able to do basically everything you've pointed up there except for encoding (which I figured would just pass a blob to a decode during the action phase). As raiph pointed out, though, I did hold off on developing it further. I think RakuAST is sufficiently mature I could get back to it, though.

But it doesn't really create an equivalent pack equivalent, so it probably wouldn't fit your use case.