Specification question - array indenting with key value pairs

csharpfritz commented 4 years ago

How should this document be parsed:

root
    key1                value1
        value2
    value3

Should this be:

An error?
an array labeled with root containing 3 values: root: [ (key1: value1), value2, value3 ]
an array containing an array: root: [ key1: [value1, value2], value3 ]
an array containing 2 child arrays: root: [key1: [value1, [value2]], value3]

surlydev commented 4 years ago

Personally, would expect this to be an error. If the intention is to have value1, value2, and value3 as an array of values for the 'key1' key then I would expect to see them equally aligned indented deeper than key1.

Stelzi79 commented 4 years ago

As I see it this is a question of do we allow these examples to be the same:

root
    value1
    value2

root
    value1
        value2

root
    value1
                        value2

If we don't allow that your example should be an error. If that is not an error than it is very complicated and difficult to parse and grasp what value is just an entry in an array and what is just a simple value in the root.

surlydev commented 4 years ago

As @Stelzi79 shows, I would expect ONLY the first example to be valid. What benefit would there be for indented values further, such as 'value2' in the middle example? Example 3 is just an exaggerated version of the middle example.

Stelzi79 commented 4 years ago

I had some more thoughts.

IMO the issues, why this example document is confusing is that there is

inconsistent indentation and
the fact that the reader of this document might appoint some special meaning to the names of the first entry that reads key1 and then the value of that key is value1.

Not allowing inconsistent indentation would make this document an invalid one. If we allow that the only meaningful interpretation of it would be that it is root containing 3 values

Everything else would mean some special and magic parsing rules that might depend on linguistic context for keys where they have different meaning when they are named a particular way.

londospark commented 3 years ago

Just been trying to convert a github action YAML File to TAML and come across the following:

steps:
    - uses: actions/checkout@v2
    - name: Build
      run: cargo build --verbose
    - name: Run tests
      run: cargo test --verbose

Where steps is an array of arrays and can't seem to work out how you would represent this in TAML. Would something likethe following work:

steps
    uses: actions/checkout@v2

    name: Build
    run: cargo build --verbose

    name: Run tests
    run: cargo test --verbose

However, I think that intrducing block scopes like this could be a confusing move. The following suggestion makes use of # as a delimiting character and may not be in the TAML spirit:

steps
    #
        uses: actions/checkout@v2
    #
        name: Build
        run: cargo build --verbose
    #
        name: Run tests
        run: cargo test --verbose

Stelzi79 commented 3 years ago

Well let me try:

You have steps that uses the actions/checkout@v2 thng (container, image, etc.). First there is the Build step which has to run the cargo build --verbose command. Next you Run tests that is run by the cargo test --verbose command.

steps
    uses    actions/checkout@v2
    Build
        run     cargo build --verbose
    Run tests
        run     cargo test --verbose

steps
    uses    actions/checkout@v2
    1
        name    Build
        run     cargo build --verbose
    2
        name    Run tests
        run     cargo test --verbose

steps
    uses    actions/checkout@v2
    name    Build
    run     cargo build --verbose
    name    Run tests
    run     cargo test --verbose

Hm seems a bit tricky (you loose the name thing) and just shows how mangled and unconforming Yaml can be.

I concur that the block scopes approach is not really fitting to TAML.

IMO in YAML you can do so many unexpected things that is highly sensitive to what parses and consumes it, there might never be a 1:1 converter for YAML but only approximations that the converter has to couple very tightly to the thing that uses this particular YAML. So my first approach would make it more verbose than the YAML with almost the same parsing cost for the thing that would consume such a TAML thing.

A 100% 1:1 conversion only would be possible if we allow that TamlKeyValuePairs can have child elements like TamlArray and all the others. I am not sure if we should want this because my first example is pretty self explanatory.

londospark commented 3 years ago

The thing that bothers me here is that we would face the same issue with JSON. Consider the following:

{
  "steps": [
    {
      "uses": "actions/checkout@v2"
    },
    {
      "name": "Build",
      "run": "cargo build --verbose"
    },
    {
      "name": "Run tests",
      "run": "cargo test --verbose"
    }
  ]
}

We once again see an array of objects, or an array of arrays of key-value pairs. This is something that I've seen a few times and not just in a github action. Am I reading that the conversion of this or the above YAML to TAML would not round-trip and if so is that going to be an issue given that the specific examlple that I'm working with is one of the examples asked for in #44? While I agree that the alternative TAML that @Stelzi79 proposes in their first example would be cleaner and superior to the current YAML I'm not convinced that we're in a position to make something that doesn't map to YAML/JSON if we're aiming for adoption.

Stelzi79 commented 3 years ago

I can see these question that we have to answer:

Should we allow TamlKeyValuePairs where the key is null/nothing and the value is a TamlArray and how do we notate that case?
What would a possible syntax look like? - or just one more indentation?
Is that case that of an important/better/useful/widesprad use case that we absolutely have to support in order to make converting to less optimal markup languages that TAML tries to fix?
Is it even the goal to support every possible syntax notation in JSON or YAML?
What would be the net-bonus of allowing this that would make TAML better/more logical/more suited/more visually graspable to perform better than the alternatives?
Is it advantageous to support seamless conversion from YAML to TAML and back of a special format configuration file that is highly coupled to a system?
Should we favor a mechanism of SchemaDefinitions and ParserDirectives that would tell a converter in a standardized way how it could convert the TAML seamlessly into a YAML file?

etnnth commented 3 years ago

I had similar thought when creating the python parser, I think we could add anonymous key (with _ ?) in list:

steps
    uses  actions/checkout@v2
    _     name  Build
          run  cargo build --verbose
    _     name  Run tests
          run  cargo test --verbose

wietlol commented 3 years ago

I suppose that the above code in TAML would instead be:

steps
    _
        uses  actions/checkout@v2
    _
        name  Build
        run  cargo build --verbose
    _
        name  Run tests
        run  cargo test --verbose

etnnth commented 3 years ago

yes or :

steps
    _   uses  actions/checkout@v2
    _   name  Build
        run  cargo build --verbose
    _   name  Run tests
        run  cargo test --verbose

I think it should yield to the same results no?

csharpfritz commented 3 years ago

In discussions on Sept 22, 2020 stream, we believe we can convert this workflow.yml:

steps:
    - uses: actions/checkout@v2
    - name: Build
       run: cargo build --verbose
      with:
        dotnet-version: 3.1.101
    - name: Run tests
      run: cargo test --verbose

to the following TAML:

steps
    1
        uses    actions/checkout@v2
    2
        name    Build
        run     cargo build --verbose
        with
                dotnet-version      3.1.101
    3
        name    Run tests
        run     cargo test --verbose

You could use this pseudo-code to navigate the collection:

steps[0].uses
steps[1].name == "Build"
steps["2"][0].Value == "Build"

londospark commented 3 years ago

What would you do if you wanted to insert an element between 1 and 2? In this example it wouldn't be an issue, however in a larger example I could see the ordering going out of sync easily and causing issues. Even here steps["2"] === steps[1] which feels rather odd to me.

csharpfritz commented 3 years ago

Sure @garethhubball - you could change the names of the introduced objects from 1,2,3 to "foo","bar","baz" appropriately. I would suggest that in this conversion case, the name kvp should persist, but also copy it to the new level.

steps
    uses
        uses    actions/checkout@v2
    Build
        name    Build
        run     cargo build --verbose
        with
                dotnet-version      3.1.101
    Run tests
        name    Run tests
        run     cargo test --verbose

londospark commented 3 years ago

I think for a manual case these are all great ideas, the issue that I'm seeing from an automatic conversion viewpoint is would something like:

steps
    1
        uses    actions/checkout@v2
    2
        name    Build
        run     cargo build --verbose
        with
                dotnet-version      3.1.101
    3
        name    Run tests
        run     cargo test --verbose

with no compiler directives convert to the YAML:

steps:
    - uses: actions/checkout@v2
    - name: Build
       run: cargo build --verbose
      with:
        dotnet-version: 3.1.101
    - name: Run tests
      run: cargo test --verbose

or to:

steps:
    1:
        uses: actions/checkout@v2
    2:
        name: Build
        run: cargo build --verbose
        with:
          dotnet-version: 3.1.101
    3:
        name: Run tests
        run: cargo test --verbose

Or is ths something that must be implementation-defined and therefore we have to drop support for a CLI conversion tool that works both ways as suggested in #12 and others as the conversion to TAML is lossy?

Stelzi79 commented 3 years ago

Let me remind you of these definitions:

JSON - JavaScript Object Notation: JavaScript Object Notation is an open standard file format, and data interchange format, that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and array data types (or any other serializable value).
YAML: YAML (a recursive acronym for "YAML Ain't Markup Language") is a human-readable data-serialization language. It is commonly used for configuration files and in applications where data is being stored or transmitted.

TAML is IMO a dedicated markup language for configuration and doesn't have the express purpose of being able to serialize every possible object there might be out there.

@garethhubball my understanding is that yaml and alike are always implementation-defined in that they only map correctly to the "thing" (GitHub workflows here in this example) they are used for. So the converter has to know what and how to make a correctly formatted yaml file for that "thing".

I also could imagine that @garethhubball last two yaml examples conversions are both valid ones. The first one is only valid for GitHub workflows and the second one would be perhaps for a completely different "The Thing X" valid.

    //#TAML.Schema  github_workflows.schema.taml
steps
    1
        uses    actions/checkout@v2
    2
        name    Build
        run     cargo build --verbose
        with
                dotnet-version      3.1.101
    3
        name    Run tests
        run     cargo test --verbose

In this the schema would define a way for a converter to convert it to yaml file that is valid for GitHub Workflows

    //#TAML.Schema  the_thing_x.schema.taml
steps
    1
        uses    actions/checkout@v2
    2
        name    Build
        run     cargo build --verbose
        with
                dotnet-version      3.1.101
    3
        name    Run tests
        run     cargo test --verbose

The schema in this case would omit the special conversion notation completely and would work "out of the box" with "The Thing X".

londospark commented 3 years ago

What I'm reading here is that interoperability with JSON and YAML aren't seen as major priorities here and can be sacrificed in order to make things more simple to read? This is something that I like and as such I think I've been asking the wrong questions and tackling this the wrong way.

Instead of 1:1 conversions that would work every time are we looking to create a TAML configuration file such that (for example) a Github Actions YAML file could be created from it using the objects that TAML parses into and a tool that understands that object and can use it to generate the specific YAML for Github Actions?

Should this be the case then I feel that we have what we need in TAML with no further extensions as we could do something like:

steps
    uses    actions/checkout@v2
    Build
        run     cargo build --verbose
        with
                dotnet-version      3.1.101
    Run tests
        run     cargo test --verbose

Which is something that has all the same meanings, for me it reads better than the other options and I could write something to convert the TAML parser output into something that a YAML generator could use to create the Github Actions specific YAML without too much trouble.

Stelzi79 commented 3 years ago

I would favor a workflow that basically states that first consider to make make it as clear as possible of a configuration Taml file and then put specially needed conversion logic into the schema.

So in these examples it should this used

    //#TAML.Schema  github_workflows.schema.taml
steps
    uses    actions/checkout@v2
    Build
        run     cargo build --verbose
        with
                dotnet-version      3.1.101
    Run tests
        run     cargo test --verbose

and

    //#TAML.Schema  the_thing_x.schema.taml
steps
    uses    actions/checkout@v2
    Build
        run     cargo build --verbose
        with
                dotnet-version      3.1.101
    Run tests
        run     cargo test --verbose

while "The Thing X" would preferably use Taml. If these can't be converted 1:1 to Json or Yaml the schema tells a converter how to convert it correctly.

londospark commented 3 years ago

Ahhh, now I understand, so the first line that is "commented out" is a directive that points to a TAML file that defines the conversions? I see the benefits to that in the future without a doubt but I'm thinking that might add complexity that we don't need to add yet and the implementation-defined route might be simpler for now.

I'm getting some callbacks to XSDs and XSLT here but I suspect that's not what you mean and it's something far more simple and elegant than those.

Stelzi79 commented 3 years ago

Yes that is exactly my opinion and not only do I see the schema file for conversion but also for some fancy things like checking validity and hinting in such things like IntelliSense.

For now I see the implementation-defined converter as the simplest thing to do for now

csharpfritz commented 3 years ago

^^ THIS ^^

That sells me on the concepts and need for schema. Let's put schemas out there on a second milestone after the initial parser features.

csharpfritz commented 3 years ago

To resolve this discussion, we have decided that inconsistent indenting will create an error scenario. This decision needs to make its way into a specification document.

Additionally, the discussion around translation of documents from JSON, XML, or YAML should make it into a specification as well, indicating that we are not seeking 100% compatibility and that some information will be lost in converting to and from those formats.

csharpfritz / TAML

Specification question - array indenting with key value pairs #26