Open design idea: inline asm

chandlerc commented 1 year ago

Disclaimer

This issue is part of a series that are just recording language design ideas that have come up for Carbon. It isn't necessarily a good idea, or one that Carbon should definitely adopt. However, it is an interesting area that has come up several times and seems to at least be promising. But it still might not work out!!

Before picking up a language design idea like this and fully developing it, we encourage you to find some folks who are very active in Carbon's language design (as well as potentially one of the leads) and discuss the area with them to get a feel for what would make sense, challenges they anticipate, etc.

Inline asm (and related)

Something that is often overlooked but I think has been essential to some of the low-level and performance critical successes of C/C++ is the use of "inline asm" or "inline assembly" or "inline assembler". This has allowed C/C++ to directly access hardware facilities at an incredibly lower level than the language typically allows, but then to wrap those operations up in the standard abstractions and facilities of the language for broader use.

Carbon will IMO very likely need a similar capability in order to attain its performance goal.

There are at least two major flavors of inline asm that I'm aware of:

Each of these has significant tradeoffs.

GCC's model for example forces all the assembly code to be in a string literal that can actually be processed by an assembler using totally different lexical and grammatical rules from C/C++.
- This in turn requires a separate syntax to marshal inputs, outputs, and other parameterization from the source language (C/C++) into the special assembly dialect.
- However, it is able to follow the sometimes very bespoke and unique conventions of the system assembly language.
MSVC's on the other hand works hard to embed the assembly into the main C/C++ language seuqence.
- This provides an extremely simple and intuitive mapping between C/C++ constructs and assembly constructs.
- However, it creates large, complex, and challenging syntax embedded within the language, where the syntax space is governed by the particular architecture targeted rather than the portable language.

Neither of these positions seem desirable in the context of Carbon -- we should aim to have a significantly better user experience of the syntax than GCC's model provides, and a much simpler syntax than what MSVC's model results in.

We should also consider two other major use cases that have so far not been explored thoroughly as embeddings to my knowledge.

First, we should explore providing a way to embed access to an implementation IR like LLVM's IR. All of these embeddings are already inherently non-portable. Given support (with sufficient reason) for directly coding against the hardware implementation, it also seems reasonable to support directly coding against the compiler implementation, with an understanding that it would not be portable to other compilers.

Second, I'd suggest at least exploring some of the other languages that are designed to sit extraordinarily close to hardware assembly languages but slightly above them for use cases like cryptography. Two such examples I'm aware of coming out of the cryptography community are http://cr.yp.to/qhasm.html and https://github.com/jasmin-lang/jasmin. But we should look for other similar truly minimal layers that might be worth either drawing inspiration from or directly leveraging as one option for this style of embedded implementation-dependent coding.

josh11b commented 1 year ago

In addition, these days there are a lot of different hardware accelerators, like GPUs and FPGAs, that have their own collection of IRs at multiple levels. It would be great if the system was extensible enough to allow embedding code targeting a variety of devices.

schorrm commented 1 year ago

An idea: I think we want several things.

asm shouldn't be a raw string blob, we still want to keep some sort of type safety.
We want to target multiple architectures
We want low syntactic overhead -- we don't want to take a million keywords to reserve space for twenty different architectures

One idea that comes to mind, is that we could map out assembly syntaxes into context receivers in Kotlin style. So if I'm in a Riscv64 context, the names x0..x31 become in scope. And then I could have something like the following:

Riscv64.asm {
  x1 = fmadd(x1, x2, x3);
}

etc

I did a rough sketch in Kotlin -- https://pl.kotl.in/4Xux1ojqt

carbon-language / carbon-lang

Open design idea: inline asm #1926

Disclaimer

Inline asm (and related)