chapel-lang / mason-registry

Package registry for mason, Chapel's package manager
17 stars 25 forks source link

Invalid table headers in cache.toml #49

Closed ankushbhardwxj closed 4 years ago

ankushbhardwxj commented 4 years ago

The cache.toml file to be used in mason-registry after merge of #48 uses TOML table headers as [LocalAtomics."0.1.0"] which is invalid as per parsing by our Chapel's TOML module. This may be updated to [LocalAtomics.0.1.0] or [LocalAtomics0.1.0].

ben-albrecht commented 4 years ago

The cache.toml file to be used in mason-registry after merge of #48 uses TOML table headers as [LocalAtomics."0.1.0"] which is invalid.

I believe [LocalAtomics."0.1.0"] is valid TOML - is it just not supported by our TOML parser?

ankushbhardwxj commented 4 years ago

@ben-albrecht Yeah, it isn't I got an error while integrating the cache with mason search. I believe we are unable to parse " " in the header.

ben-albrecht commented 4 years ago

I wonder how hard this would be to fix in the TOML parser.

I believe [LocalAtomics.0.1.0] is represented as a hierarchy of tables:

LocalAtomics
  0
    1
      0

I suggested the quotes approach to make it a single table.

Does a fully quoted name work, e.g. ["LocalAtomics.0.1.0"] ?

ankushbhardwxj commented 4 years ago

I wonder how hard this would be to fix in the TOML parser.

I believe we need to fix the regex string which is used to parse the headers. I had locally changed to [LocalAtomics0.1.0] and it did work correctly for getting the score. This wasn't treated as :

LocalAtomics0
            1
                0
ankushbhardwxj commented 4 years ago

Does a fully quoted name work, e.g. ["LocalAtomics.0.1.0"] ?

Nope that didn't work. I didn't get an error message but the score was not parsed. It works without the . between LocalAtomics and version.

ben-albrecht commented 4 years ago

I had locally changed to [LocalAtomics0.1.0] and it did work correctly for getting the score.

Based on the TOML spec, I believe this is incorrect behavior.

I feel mixed about using that as a work-around because it relies on incorrect behavior in the current implementation of Chapel's TOML parser, and it will cause issues when that gets fixed.

I propose we take one of the approaches:

1) Fix this bug in the TOML parser so that we can use Pkg."0.1.0" as a table name, and so that Pkg.0.1.0 creates a hierarchy.

2) Implement a work-around that doesn't depend on existing incorrect behavior. For example, we could use a different delimiter for the versions: [Foo.0_1_0].

(1) is preferable, but I'm not sure how large of an effort that is. @Spartee may be able to weigh in.

To double check, I tried Python's toml package to validate my understanding of the expected behavior:

>>> toml.loads("[foo.0.1.0]")
{'foo': {'0': {'1': {'0': {}}}}}

>>> toml.loads('[foo."0.1.0"]')
{'foo': {'0.1.0': {}}}
Spartee commented 4 years ago

So after a quick look I think some of the issue is how we generate tokens in the TomlReader. Effectively everything is turned into tokens with the following gigantic regex string


      const doubleQuotes = '(".*?")',           // ""
            singleQuotes = "('.*?')",           // ''
            bracketContents = "(\\[\\w+\\])",   // [_]
            brackets = "(\\[)|(\\])",           // []
            comments = "(\\#)",                 // #
            commas = "(\\,)",                   // ,
            equals = "(\\=)",                   // =
            curly = "(\\{)|(\\})",              // {}
            dt = "^\\d{4}-\\d{2}-\\d{2}[ T]\\d{2}:\\d{2}:\\d{2}",
            ld = "^\\d{4}-\\d{2}-\\d{2}",
            ti = "^\\d{2}:\\d{2}:\\d{2}(.\\d{6,})?";

      const pattern = compile('|'.join(doubleQuotes,
                                       singleQuotes,
                                       bracketContents,
                                       brackets,
                                       commas,
                                       curly,
                                       equals,
                                       dt,
                                       ti,
                                       ld));

This bug is actually already documented in the improve TOML issue.

So this gets at a bigger problem with the TOML module that I wish I did the first time which is to have two passes of tokenization. The first pass would seperate the TOML data containers: tables, arrays, arrays of tables(not implemented), and the second would populate the tokens within those containers. I know this is a bit hand-wavey but thats at least how I had it in my mind when thinking about how to handle these cases.

The shorter term fix, I think, is to reorder the regex arguments so that tables come before the double quotations. I haven't tried this myself though.

Another possible option would be to get fancier with the regex and have a tblName regex that would include specific regex for table names.

ankushbhardwxj commented 4 years ago

Closing this issue here since it has been fixed by https://github.com/chapel-lang/chapel/pull/16091