apple / pkl-pantry

Shared Pkl packages
Apache License 2.0
234 stars 31 forks source link

[TOML] Renderer never renders maps as inline values #61

Open archer-321 opened 4 months ago

archer-321 commented 4 months ago

Context

AFAICT, the current implementation of toml.Renderer has code to render map-like elements as inline tables: toml.pkl:146

However, the current implementation doesn't seem to use that code, as every type that gets converted to a Map ends up rendering as a regular table.

Suggested change

While inlining small tables generally improves readability, large tables can quickly lead to very long lines if inlined. To determine whether a table would be small enough to inline, the implementation could

I understand that none of the suggestions above are ideal, but considering no issue exists for this segment of dead code, I opened this issue anyway.

Example

The examples below were generated using pkl.toml@1.0.0, and pkl --version yields 0.26.0-dev+47f161a (I built the native binary locally).

The following example contains a mapping of usernames to the user's UID and GID:

import "package://pkg.pkl-lang.org/pkl-pantry/pkl.toml@1.0.0#/toml.pkl"

output {
    renderer = new toml.Renderer {}
}

users {
    foo {
        uid = 1000
        gid = 1000
    }
    cups {
        uid = 209
        gid = 209
    }
    nobody {
        uid = 65534
        gid = 65534
    }
}

Currently, it produces the following document:

[users.foo]
uid = 1000
gid = 1000

[users.cups]
uid = 209
gid = 209

[users.nobody]
uid = 65534
gid = 65534

However, the same data could be expressed in TOML using inline tables:

[users]
foo = { uid = 1000, gid = 1000 }
cups = { uid = 209, gid = 209 }
nobody = { uid = 65534, gid = 65534 }

While the verbose tables that the current implementation generates are still readable in this basic example, the additional tables quickly become a problem if many small objects are part of another table. For example, the following version renders as five tables and an array of tables (two [[]] tables).

Second example ```pkl import "package://pkg.pkl-lang.org/pkl-pantry/pkl.toml@1.0.0#/toml.pkl" output { renderer = new toml.Renderer {} } machines = new Listing { new { hostname = "office" users { foo { uid = 1000 gid = 1000 } cups { uid = 209 gid = 209 } nobody { uid = 65534 gid = 65534 } } } new { hostname = "server" users { admin { uid = 1000 gid = 1000 } named { uid = 40 gid = 40 } } } } ``` This generates: ```toml [[machines]] hostname = "office" [machines.users.foo] uid = 1000 gid = 1000 [machines.users.cups] uid = 209 gid = 209 [machines.users.nobody] uid = 65534 gid = 65534 [[machines]] hostname = "server" [machines.users.admin] uid = 1000 gid = 1000 [machines.users.named] uid = 40 gid = 40 ```

A version using inline tables could represent the same data using two tables and an array of tables or even just the array of tables if the users table is inlined as well (even though that would lead to very long lines).

bioball commented 4 months ago

The TOML renderer will output inline tables in some circumstances, for example, if it is producing a mixed object:

import "package://pkg.pkl-lang.org/pkl-pantry/pkl.toml@1.0.0#/toml.pkl"

res {
  1
  new { foo = 2 }
}

output {
    renderer = new toml.Renderer {}
}

This produces:

res = [ 1, { foo = 2 } ]

What is the motivation for wanting inline tables? FWIW: trying to make rendered output textually equivalent to some output is somewhat of a never-ending battle. Generally, you should trust that Pkl is producing data that is semantically equivalent to your desired output.

archer-321 commented 4 months ago

The TOML renderer will output inline tables in some circumstances, for example, if it is producing a mixed object

Oh, I missed mixed-type listings (or other mixed-type objects for which toMap() returns an empty map). Looking at the code again, it makes sense why those would end up inlined. I assume TOML's syntax wouldn't allow for a non-inline table definition in arrays that can't be represented as an array of tables.

However, this implies objects with properties/entries can never generate inline tables.

What is the motivation for wanting inline tables? FWIW: trying to make rendered output textually equivalent to some output is somewhat of a never-ending battle. Generally, you should trust that Pkl is producing data that is semantically equivalent to your desired output.

I'm currently exploring Pkl as a "configuration preprocessor" with validation at configuration time. While the generated configuration files would not be edited manually, having human-readable configuration files would allow manual inspection even if the source .pkl files aren't available.

That said, one reason I opened this issue was that I considered the inline table code to be unused. With mixed-type arrays even requiring this syntax, this issue is reduced to the question of how much Pkl focuses on human-readable output.

bioball commented 4 months ago

That said, one reason I opened this issue was that I considered the inline table code to be unused. With mixed-type arrays even requiring this syntax, this issue is reduced to the question of how much Pkl focuses on human-readable output.

We definitely want to produce human-readable output. Readability is a little subjective, though; I don't know that this is harder to read:

[users.foo]
uid = 1000
gid = 1000

[users.cups]
uid = 209
gid = 209

[users.nobody]
uid = 65534
gid = 65534

Do you have a suggestion for what the heuristic should be? At what point should the renderer emit inline tables?

archer-321 commented 4 months ago

Do you have a suggestion for what the heuristic should be? At what point should the renderer emit inline tables?

This is indeed a problem, as the three potential implementations I listed in the issue description all have drawbacks. Personally, I inline tables if they have few, short values, e.g., tables with <= 5 keys, no nested tables, and no long strings. An alternative metric could be the resulting line length. E.g., if the line is longer than 100 characters, a regular table is used instead.

An alternative that wouldn't require a heuristic would be a way for implementations to opt in to inlining. As an example, a property shouldInline: (unknown) -> Boolean in toml.Renderer could allow the user to implement custom heuristics.

We definitely want to produce human-readable output. Readability is a little subjective, though; I don't know that this is harder to read

Indeed, the first example is still readable. For me, readability only becomes problematic when using "small tables" in an array of tables. The problem is that TOML's table syntax makes it hard to spot "supertables" when defining a lot of "subtables" using the non-inlined syntax. This is an inherent limitation of TOML, as the language heavily favours "flat" configuration schemas over deeply nested ones. However, adding a way to inline small tables can help maintain readability while grouping related fields.

For example, let's consider the following Pkl definition:

import "package://pkg.pkl-lang.org/pkl-pantry/pkl.toml@1.0.0#/toml.pkl"

output {
    renderer = new toml.Renderer {}
}

class Theme {
    name: String
    primaryColor: Color
    author: Author
}

class Color {
    red: Int(isBetween(0, 255))
    green: Int(isBetween(0, 255))
    blue: Int(isBetween(0, 255))
}

class Author {
    name: String
    email: String(matches(Regex(".*@.*")))
}

class SiteConfiguration {
    themes: Listing<Theme>
    // The properties below could use classes as well
    languages: Listing<String>
    features: Listing<String>
}

`site-configuration`: SiteConfiguration

Let's write a configuration file amending the site-configuration key with the following data:

`site-configuration`: SiteConfiguration = new {
    languages {
        "de_DE"
        "en_GB"
        "en_US"
        // ...
    }
    features {
        "public-uploads"
        "group-creation"
        "shared-ci-runners"
        "logo-rebrand"
    }
    themes {
        new {
            name = "Dark Theme"
            primaryColor {
                red = 0x22
                green = 0x27
                blue = 0x2E
            }
            author {
                name = "Example Company"
                email = "themes@example.com"
            }
        }

        new {
            name = "Light Theme"
            primaryColor {
                red = 0xC5
                green = 0xD1
                blue = 0xDE
            }
            author {
                name = "Example Company"
                email = "themes@example.com"
            }
        }

        new {
            name = "High Contrast Light Theme"
            primaryColor {
                red = 0xFF
                green = 0xFF
                blue = 0xFF
            }
            author {
                name = "Other Company"
                email = "noreply@localhost"
            }
        }

        new {
            name = "High Contrast Dark Theme"
            primaryColor {
                red = 0x00
                green = 0x00
                blue = 0x00
            }
            author {
                name = "Other Company"
                email = "noreply@localhost"
            }
        }
    }
}

While the Pkl code for this configuration is also very long, the indentation and blocks using { } keep this configuration still relatively readable.

Currently, this code generates the following output:

[site-configuration]
languages = [ "de_DE", "en_GB", "en_US" ]
features = [ "public-uploads", "group-creation", "shared-ci-runners", "logo-rebrand" ]

[[site-configuration.themes]]
name = "Dark Theme"

[site-configuration.themes.primaryColor]
red = 34
green = 39
blue = 46

[site-configuration.themes.author]
name = "Example Company"
email = "themes@example.com"

[[site-configuration.themes]]
name = "Light Theme"

[site-configuration.themes.primaryColor]
red = 197
green = 209
blue = 222

[site-configuration.themes.author]
name = "Example Company"
email = "themes@example.com"

[[site-configuration.themes]]
name = "High Contrast Light Theme"

[site-configuration.themes.primaryColor]
red = 255
green = 255
blue = 255

[site-configuration.themes.author]
name = "Other Company"
email = "noreply@localhost"

[[site-configuration.themes]]
name = "High Contrast Dark Theme"

[site-configuration.themes.primaryColor]
red = 0
green = 0
blue = 0

[site-configuration.themes.author]
name = "Other Company"
email = "noreply@localhost"

Especially if we now start using classes for languages and features, this quickly becomes hard to read. TOML would allow the following result using inline tables, which I consider more readable:

[site-configuration]
languages = [ "de_DE", "en_GB", "en_US" ]
features = [ "public-uploads", "group-creation", "shared-ci-runners", "logo-rebrand" ]

[[site-configuration.themes]]
name = "Dark Theme"
primaryColor = { red = 34, green = 39, blue = 46 }
author = { name = "Example Company", email = "themes@example.com" }

[[site-configuration.themes]]
name = "Light Theme"
primaryColor = { red = 197, green = 209, blue = 222 }
author = { name = "Example Company", email = "themes@example.com" }

[[site-configuration.themes]]
name = "High Contrast Light Theme"
primaryColor = { red = 255, green = 255, blue = 255 }
author = { name = "Other Company", email = "noreply@localhost" }

[[site-configuration.themes]]
name = "High Contrast Dark Theme"
primaryColor = { red = 0, green = 0, blue = 0 }
author = { name = "Other Company", email = "noreply@localhost" }

Of course, nested tables in TOML will always be limited to a few levels before they become unreadable, and I understand that inline tables aren't a magic trick that will make all problems go away and turn TOML into YAML.

However, in many configuration formats I'm working with, closely related fields (like the colour values of primaryColor) are grouped together into separate objects, even if the configuration schema tries to be flat overall. Even being able to reduce one level of non-inlined tables would help those files become significantly more readable.

bioball commented 4 months ago

That's fair. TOML's method for nesting arrays of objects as top-level tables makes it hard to maintain context when reading through a file.

Their inline tables are also kind of limiting because they don't allow newlines. Just turning things into inline tables can also be rough.

We'll probably add the TOML renderer to the standard library. I think this is something that we'll look to improve when we do that, so we probably won't prioritize this fix for the pkl.toml package. If you'd like, though, feel free to submit a PR!

These heuristics can be added as properties to the Renderer class.