anoma / juvix

A language for intent-centric and declarative decentralised applications
https://docs.juvix.org
GNU General Public License v3.0
449 stars 54 forks source link

Add support for Strings in the Anoma backend #2789

Closed paulcadman closed 4 months ago

paulcadman commented 4 months ago

This PR adds support for the String type, String literals and string concatenation to the Nockma backend. Support for the builtins show and intToString is not supported.

Example

test079.juvix

module test079;

import Stdlib.Prelude open;

main (s : String) : String :=
  s ++str " " ++str "✨ héllo" ++str " " ++str "world ✨";

args.nockma

[quote "Juvix!"]
$ juvix compile anoma test079.juvix
$ juvix dev nockma run test079.pretty.nockma --args args.nockma
"Juvix! ✨ héllo world ✨"

String representation

A String is a sequence of UTF-8 encoded bytes. We interpret these bytes as a sequence of bits to represent the string as an integer atom in nockma.

For example:

The string "a" is UTF-8 encoded as 97 which is 0b1100001 in bits.

The string "ab" is UTF-8 encoded at the pair of bytes: 97 98 which is 0b1100001 0b1100010.

When we combine the bytes into a single sequence of bits we must take care to pad each binary representation with zeros to each byte boundary.

So the binary representation of "ab" as an atom is 0b110000101100010 or 24930 as an integer atom.

String concatenation

We use the cat function in the Anoma stdlib to concatenate the bytes representing two strings.

We need to use the block parameter 3 in the Anoma call because we want to treat the atoms representing the strings as sequences of bytes (= 2^3 bits).

To find the relevant Nock code to call cat with block parameter 3 we use the urbit dojo as follows:

 =>  anoma  !=(~(cat block 3))
[8 [9 10 0 7] 9 4 10 [6 7 [0 3] 1 3] 0 2]

Stdlib intercept in Evaluator

The evaluator has support for strings using AtomHints, so strings can be printed and traced. The stdlib cat call is also intercepted because evaluating the unjetted hoon version is slow.

String support in pretty nockma

In a pretty nockma file or nock quasi-quote you can write double quoted string literals, e.g "abc". These are automatically translated to UTF-8 integer atoms as in the previous section.