NixOS / nix

Nix, the purely functional package manager
https://nixos.org/
GNU Lesser General Public License v2.1
11.96k stars 1.46k forks source link

Support hexadecimal and octal syntax for integers #7578

Open roberth opened 1 year ago

roberth commented 1 year ago

Is your feature request related to a problem? Please describe.

Some configuration formats support hex or octal syntax. It would be nice if those numbers were easily expressed in Nix as well.

Describe the solution you'd like

Extend the parser to parse 0x[:xdigit]+ as a integer converted from base 16 Currently e.g. 0xff is parsed as 0 xff, which ~would never return anything~ is only valid syntax when used inside a list literal. ~We can assume that all expressions containing a hexadecimal integer are currently invalid expressions. That makes this change an improvement instead of a breaking change.~ Nonetheless, this seems like a very unlikely sub-expression in practice, as most people have used actual whitespace to separate numbers and identifiers in lists, if they even consider building a heterogeneous list.

Similarly, extend the parser to parse 0o[0-7]+ as a integer converted from base 8.

Similarly, make sure negative integers in these bases work.

I would suggest not to save the base of the integer in its Value representation. This keeps integers simple.

Describe alternatives you've considered

Additional context

Inspired by https://github.com/NixOS/nixpkgs/pull/208747#discussion_r1064207990

Priorities

Add :+1: to issues you find important.

shikanime commented 1 year ago

Hello, I would love to work on a this ( ˶˃ᆺ˂˶) The implementation of lib.fromHex is quite trivial, was there an issue with it ? (🔄 reading the additional context)

roberth commented 1 year ago

The point of this issue is to add native literals to the language. I'm sure Nixpkgs can implement lib.fromHex quite well, but that'd be a function from string to integer. I don't think it has an issue, but it's just a different thing.

The goal here is to have the following.

# this is a nix file
assert 15 == 0xf;
assert toString 0xf == "15";
assert 0o10 == 8;
# etc

You may notice that github doesn't know how to highlight that; Nix doesn't know how to parse it either, yet.

roberth commented 1 year ago

Oh no! I forgot about the list item separator...

This proposal does steal usable syntax:

nix-repl> let x10 = "x number ten"; in [ 0x10 ]
[ 0 "x number ten" ]

It's still unlikely to cause a problem, but it does mean that this is technically a breaking change.

ClarkeRemy commented 1 year ago

Might I recommend that instead of 0x<number in hex> for only hexadecimal, allow <base in decimal>b<number in base>? It's more extensible and allows more bases to be leveraged.

All of the following would equal 11 in base 10.

10b011 # Decimal
2b1011 # Binary
8b0013 # Octal
16b00A # Hexadecimal

7b0014 # even base 7
12b0a.0 # even works with floats

This parses easily because it still begins with a numeral. (The letter 'b' could be bikeshed, consider '_'.)

If this was implemented with case insensitivity, you get 36 bases. (10 numeral + 26 ascii letters) With case sensitivity you can have up to base 62. (10 numeral + 52 ascii letters)

eclairevoyant commented 1 year ago

Why? Is there a use case for making base 23 or base 37 easier to write?

roberth commented 1 year ago

This seems over-engineered, and the base 32 wouldn't match Nix's own base32 encoding; lack of base 64 would be surprising.

I'd like to reserve _ as an ignored digit group separator, similar to Haskell's NumericUnderscores:

million = 1_000_000;
roberth commented 1 year ago

We might also want to steal syntax such as 64Ki == 65536 to represent the base 2 unit prefixes (Ki, Mi, Gi, etc), although that opens up more questions about complete units, rather than just unit prefixes. Perhaps we could reserve all alphabetic suffixes but only implement unitless unit prefixes that return the usual unitless numbers.

x10an14 commented 1 year ago

@eclairevoyant: Why? Is there a use case for making base 23 or base 37 easier to write?

Nope - but it's all just the modulo operator, so why go look for hardcoded chars when we could take the <base> number as input to the modulo operation?

I would argue that it's simpler/easier with integerValue := <inputNumber> % <inputBaseValue>, than hard-coding for specific bases. What are you gonna do when another base is requested/implemented? We've already got 4x typical ones (decimal, binary, octal, hexadecimal).

Time (bases 24 & 60) would also be somewhat realistic use-case(s) (since you asked), for a vast majority of us using Nix.

@roberth: This seems over-engineered, and the base32 wouldn't match Nix's own base32 encoding; lack of base 64 would be surprising.

Re. over-engineering; see above.

Re. base64; this suggestion would make the case for bases higher than 36 simpler to write/extend once someone(tm) figures out how to solve the (completely orthogonal to this discussion - at least so far) problem with base64.

@roberth: I'd like to reserve _ as an ignored digit group separator, similar to Haskell's NumericUnderscores:

million = 1_000_000;

Good enough argument for me to move away from _ as a separator.

EDIT: Minor edits for clarity/readability.

eclairevoyant commented 1 year ago

What are you gonna do when another base is requested/implemented? We've already got 4x typical ones (decimal, binary, octal, decimal).

Decimal is a given, hex is common, octal is somewhat niche and binary is quite niche. I don't imagine getting requests for other bases in this fashion. These prefixes are also standard across the computing world, no need to have some bespoke integer format.

Time (bases 24 & 60) would also be somewhat realistic use-case(s) (since you asked), for a vast majority of us using Nix.

This is not how you represent time, so that's actually hurting your point. To me something like time would make more sense to handle in a library function, since that's a mix of bases (and also would need to handle time zones), or even just an external command.

x10an14 commented 1 year ago

What are you gonna do when another base is requested/implemented? We've already got 4x typical ones (decimal, binary, octal, decimal).

@eclairevoyant: Decimal is a given, hex is common, octal is somewhat niche and binary is quite niche. I don't imagine getting requests for other bases in this fashion. These prefixes are also standard across the computing world, no need to have some bespoke integer format.

I am not as confident as you in your (I feel too) strong assertion(s). But I don't think I've got anything more productive to add to the discussion than I already have at this point, so tapping out here - leaving this for you (who feels strongly enough to write "no need to have some bespoke integer format" (emphasis mine)) and others to decide.

Time (bases 24 & 60) would also be somewhat realistic use-case(s) (since you asked), for a vast majority of us using Nix.

@eclairevoyant: This is not how you represent time, so that's actually hurting your point.

I definitely think your argument is over-reaching here - seems (to me) like you're presuming knowledge of all use-cases. I (personally) have no problem seeing myself (or others) using something like services.<serviceName>.config.intervalAttribute = 7 * 24n1;.

ClarkeRemy commented 1 year ago

Why? Is there a use case for making base 23 or base 37 easier to write?

Cherry picking?

This is already a thing in Bash but it uses # This prints 120

$ echo $((23#55))

This seems over-engineered, and the base 32 wouldn't match Nix's own base32 encoding; lack of base 64 would be surprising.

I presume you mean the SHA sums for packages. This is semantically different, people are not expected to read hashes as actual numbers, or actually reason about them.

Base 64 has always been a tricky one. I would love for a way to do it for base 64.

I'd like to reserve _ as an ignored digit group separator, similar to Haskell's NumericUnderscores:

million = 1_000_000;

I'm fine with the 'b' character. I think 'n' could be good too. Bikeshed away.

16_0a
16b0a
16n0a

I'm also of the opinion that we could just have a builtin called base That takes a number and string

builtin.base 16 "FF"
builtin.base 23 "55"
eclairevoyant commented 1 year ago

I have no objection to a builtin for less-used bases. All I'm getting at is I don't think that should block movement on this feature request. Nor do I think we should look to bash for user friendliness.

roberth commented 1 year ago

Arbitrary base literals are hard to read and understand, and any remaining use cases are sufficiently niche that they're better covered by a library function.

intervalAttribute = 7 * 24n1

This is complete gibberish to me. interval suggests something about time, so perhaps this is better solved by support for units. I'm happy to reserve syntax for units, but I'd like to save the remaining discussion about units for when hex and octal have been implemented.

I do have an objection to a new built-in. Builtins have to be absolutely bug-free and can not be changed. Furthermore they make the implementation of alternate evaluators needlessly hard. Niche functions like arbitrary-base parsing are better implemented in a Nix-language library such as a flake, which does not suffer from these problems, or nixpkgs.lib, which suffers very little from these problems.

x10an14 commented 1 year ago

I have no objection to a builtin for less-used bases. All I'm getting at is I don't think that should block movement on this feature request. Nor do I think we should look to bash for user friendliness.

I'm tapping out of this back and forth.

But it's important to me to clarify for anyone who cares that none of my inputs to this thread was meant as @eclairevoyant interpreted them;

block(ing)

ClarkeRemy commented 1 year ago

Arbitrary base literals are hard to read and understand, and any remaining use cases are sufficiently niche that they're better covered by a library function.

intervalAttribute = 7 * 24n1

This is complete gibberish to me. interval suggests something about time, so perhaps this is better solved by support for units. I'm happy to reserve syntax for units, but I'd like to save the remaining discussion about units for when hex and octal have been implemented.

I do have an objection to a new built-in. Builtins have to be absolutely bug-free and can not be changed. Furthermore they make the implementation of alternate evaluators needlessly hard. Niche functions like arbitrary-base parsing are better implemented in a Nix-language library such as a flake, which does not suffer from these problems, or nixpkgs.lib, which suffers very little from these problems.

fair enough, just thought I would pitch in.

x10an14 commented 1 year ago

Arbitrary base literals are hard to read and understand, and any remaining use cases are sufficiently niche that they're better covered by a library function.

intervalAttribute = 7 * 24n1

This is complete gibberish to me. interval suggests something about time, so perhaps this is better solved by support for units. I'm happy to reserve syntax for units, but I'd like to save the remaining discussion about units for when hex and octal have been implemented.

Do you still feel that's the case with the below code? I'm curious.

let
 dayAndAHalf = 12n3;
in {
  intervalAttribute = 3 * dayAndAHalf;
}
eclairevoyant commented 1 year ago

12n3 is gibberish as well. What does that have to do with a day and a half?

But it's important to me to clarify for anyone who cares that none of my inputs to this thread was meant as @eclairevoyant interpreted them; block(ing)

Of course this discussion is blocking since it directly conflicts with the proposed syntax. If it wasn't blocking then it should be a separate request, yes?

roberth commented 1 year ago

Did you mean 12n30 instead of 12n3? The fact that you're making this mistake (or the interpretation is ambigous) is not helping your point, and it will not refute the fact that arbitrary-base syntax is unconventional and surprising to all readers. Most readers only use the language occasionally, so we should try to avoid surprises. I'm enough of a nerd to enjoy the idea, but professionally we're not going to implement arbitrary-base syntax, and we've exhausted the subject by now. Please stop.

x10an14 commented 1 year ago

12n3 is gibberish as well. What does that have to do with a day and a half

Not engaging.

I'm enough of a nerd to enjoy the idea, but professionally we're not going to implement arbitrary-base syntax, and we've exhausted the subject by now. Please stop.

I'm on the same page.

@eclairevoyant: Of course this discussion is blocking since it directly conflicts with the proposed syntax. If it wasn't blocking then it should be a separate request, yes?

Then where are discussions to be held?

mweinelt commented 6 months ago

I needed octal support to represent file permissions in a module today. Converting the octal int from its string representation into a decimal integer was the only safe way I found to transport it through PyYAML (wants octal numbers to be zero-prefixed) into Python's os.chmod (supports all kinds of integer representations).

Anyway, since this was one of the first issues I found on the matter: Here is a function that converts an octal string into an integer with base 8.

toIntBase8 = str:
  lib.pipe str [
    lib.stringToCharacters
    (map lib.toInt)
    (lib.foldl (acc: digit: acc * 8 + digit) 0)
  ];
chayleaf commented 6 months ago

builtins.fromTOML can be used as a compact way to parse bin/oct/dec/hex numbers in modern Nix versions, like assert (builtins.fromTOML "a = 0b11").a == 3. Of course, this isn't a replacement for adding non-decimal integer literal support into the language.