RFC: Declare Maps Up-Front

ajor commented 7 months ago

This is a proposal to introduce a new syntax for declaring maps ahead of their use, outside of probes:

let @x[uint64] = stack_t;

BEGIN
{
  @x[123] = ustack();
}

I would like this functionality, but am not tied to this syntax in particular.

Use Case 1 - Declare key/value types

The types of map keys and values are currently inferred automatically from usage. This works well for simple scripts, but there are maintainability benefits to declaring the expected types of variables in larger scripts, both for understandability and ease of refactoring.

Example - Types not inferred from first usage

let @x[uint64] = int64;

PROBE1
{
  @x["123"] = 456; // with a map declaration this line produces a type error
}
PROBE2
{
  @x[123] = 456;
}
PROBE3
{
  @x[0] = 0;
}

In the middle of refactoring a large script, changing the type of the key in PROBE1 might have been missed and it would be useful to get an error on that line. Without a map declaration, PROBE2 and PROBE3 would trigger type errors as PROBE1 was seen first.

Example - Control exact types used

let @x[uint8] = int32;

BEGIN
{
  @x[123] = 456; // the literals would be inferred as int64 without the declaration
}

Use Case 2 - Configure Map Settings

The default capacity of a hash map is not large enough for many purposes. Increasing a map's capacity is currently only possible through a global config option that applies to all maps. This wastes memory by increasing the capacity of all maps, when other maps may be fine with a smaller capacity.

Example - Custom size for hash map

let @x[int64] = int64 : HashMap(1024);

BEGIN
{
  @x[123] = 456;
}

Use Case 3 - Create non-hash-maps

Allow users to declare maps backed by data structures other than a BPF Hash Map. For example, an array map might be used for efficiency when it is known that a fixed selection of keys is needed.

Example - Create an array map

let @x[uint32] = int64 : ArrayMap(128); // BPF specifies that array maps must have a "uint32" key

BEGIN
{
  @x[123] = 456;
}

danobi commented 7 months ago

Yeah, use cases seem totally reasonable. Exact syntax should hopefully leverage type annotation work done for user defined functions.

viktormalik commented 7 months ago

Agreed with @danobi, #2624 uses the $arg: type syntax for function arguments so we should somehow mirror that here (we could eventually add $var: type = ... syntax for local vars.

jordalgo commented 7 months ago

All makes sense to me. Maybe because I just worked on the config syntax, I'm considering the "floating" element of these type annotations e.g. will they be forced to be declared at the top? should they go after config = {?

Perhaps they should get wrapped in some other similar demarcation e.g.

types = {
   let @x[uint64] = int64;
}

User-defined command line args could also live here. Just something to think about as bpftrace has no (and doesn't want) header files and we want a way to keep things organized.

JakeHillion commented 3 weeks ago

Taking a look at this now. The current (verbose) syntax I'm looking at is:

let global @x: int32;
let global @x: [int32 -> int64];
let global @x: [int32, int16 -> int64];

This would be extensible in future to (with certain conditions):

let global @y: int64 = ArrayMap;
let global @y = ArrayMap;

I considered making the global optional, but for now have left it global.

Placement rules are currently equivalent to probes. That is, you can place these let statements anywhere in the global scope. The semantic analyser rejects duplicate lets for the same map, and processes all maps before processing the rest of the program. This means that a map declared after a probe that uses it has full effect.

The let syntax could be further extensible to things like:

let type t1 = [int32 -> int64];
let global @x: type;
let type t2 = int32;
BEGIN { let y: t2; }

Feedback seems to be to drop the global, which doesn't create any more ambiguity so that's fine. I like the rest for a uniform "declare" something syntax but could be swayed too. Will put up a PR soon.

jordalgo commented 3 weeks ago

@JakeHillion Thanks for picking this up.

As discussed offline, I'm curious if dropping both the "global" and the "let" is possible to improve the conciseness of this new syntax e.g.

@x: int32;
@x: [(int32, int16) -> int64];

I think this could then be re-used for scratch variables inside probes when we want to add variable declarations e.g.

kprobe:do_nanosleep {
  $scratch: macaddr_t;
  if (condition) {
    $scratch = ...
  } else {
    $scratch = ...
  }
}

bpftrace / bpftrace