Open bjesuiter opened 1 year ago
@bjesuiter I'm not sure what you mean by an example?
From https://github.com/esm-dev/esm.sh/issues/687#issuecomment-1647591989:
does the parser of the std jsonc module discard comments or the stringifier?
The parser. The comment on the *#tokenize
method is "Split the JSONC string into token units. Whitespace and comments are skipped." (my emphasis). In fact, as far as I can tell, the standard parser just parses, it doesn't have a stringifier. The parser returns a JavaScript value, not an AST, so there's nothing that could represent a comment node. I haven't gotten deeper into it than that (I'm also short on time, particularly this week). I stopped once I'd found a module that would let us insert things into the JSONC file while preserving comments (though not perhaps as conveniently as one might like).
Thanks @farsightsoftware for the explanation. I wasn't aware that the jsonc package in deno std does not contain a stringifier, but only a parser. I simply assumed, it would also have a stringifier.
After thinking about it, I also realized the problem with returning a JS Object vs. an AST. Maybe this jsonc package was only intended to be used as a means for reading jsonc config files easily.
However, I find it a big roadblock for developers when this lib is not able to reconstruct the file correctly with comments.
I have a conference this week, so I'd add this topic to my list of things to tackle the weeks after. The original issue with esm.sh should be solveable without this.
But I think, the std lib should provide a correct implementation for stringifying jsonc in general, so I'd like to keep this issue open.
Has there been any status update or progress on this bug lately? I find it to be quite an annoying developer experience with ESM not yet doing the right thing for updating my deno.jsonc automagically.
So much so that I would love to design/prototype/contribute a PR if nobody is actively working on this now!
I am crudely thinking of something like a ParseOption to slightly modify the return type of parse (or else to make a new function called e.g. parseTree), such that the returned type includes all details necessary to faithfully stringify the JSON value while reintroducing the comments. (In addition, the returned type could serve as an API for subsequent changes and/or stringification.)
@mmuddasani I like this idea! I don't know though whether deno core team would accept the pr. but we should try!
Step 1: we should look for a standard format for representing parsed jsonc as a concrete syntax tree, so that we avoid creating a new proprietary format.
Step 2: adding this parseTree function you proposed (I find it more clear than adding a parameter to the parse function, since you'd have to use a union return type for the parse function otherwise)
I cannot write it this week, but if you're interested, I'd do a review of the PR! :)
Microsoft's jsonc module that @farsightsoftware found appears incredibly comprehensive. Someone has even ported it as a third-party Deno module: https://deno.land/x/jsonc
It has an MIT license, so I'm wondering if Deno's standard library can leverage the existing implementation, with appropriate copyright attribution.
Alternatively: does parsing JSONC have any reason to belong in the standard library, compared to folks reusing Microsoft's implementation?
If backwards compatibility is important, and if there is reason for Deno to keep support, then merging the two APIs via method overloading seems feasible, despite the existing divergent parse
signatures:
parse(text: string, unnamed 1?: ParseOptions): JsonValue
parse(text: string, errors?: {error: ParseErrorCode;}[], options?: ParseOptions): any;
Aside: Deno's ParseOptions
is currently a strict subset of Microsoft's ParseOptions
@mmuddasani @bjesuiter Thanks for considering jsonc support with retained comments.
Microsoft's jsonc module that @farsightsoftware found appears incredibly comprehensive. Someone has even ported it as a third-party Deno module: https://deno.land/x/jsonc
Microsoft's jsonc module looks too complex to me. It looks exposing too much details about internals, such as scanner, parser, visiter, etc. I don't think the users need to access such lower level primitives. Also that is not aligned to other std modules such as yaml, toml, etc.
If the goal here is to keep the comments, then how about following npm:comment-json
? The API looks much simpler than Microsoft's npm:jsonc-parser, and it also seems having enough download counts.
@kt3k Nice! The lib I found was just the first one I found for the limited purpose I was looking. comment-json
looks way better in the general case (and also for that limited purpose). Very clever to use Symbols for keeping track of comments, since JSON doesn't have them.
I agree that the api of comment-json is very nice & simple to understand. (As long as you don't need to dig deeper into the representation). But I agree, using Symbols seems smart for me in this case, since we avoid a complete parseTree.
@kt3k do you agree with "importing" this lib into deno std or do you simply propose usage of this lib?
I imagine the only downside with incorporating the API of npm:comment-json as-is into deno std is the pollution of the global Symbol registry. Symbols are a very neat idea, and I imagine they can still be effectively used without resorting to Symbol.for
.
I also imagine it would be re-implemented, to avoid its external dependencies like npm:esprima.
do you agree with "importing" this lib into deno std or do you simply propose usage of this lib?
Because we have npm:
specifier now, we recommend using npm:comment-json
of course, but we are also open to including npm:comment-json
-like enhancement of std/jsonc
module.
If we include features like npm:comment-json
in std, we probably would like to extend existing std/jsonc
implementation instead of replacing it with ported version of npm:comment-json
cc @ayame113 Do you have any opinion or advise to this issue?
I'm in favor of including features like npm:comment-json
to std.
I think npm:comment-json
's parse
and stringify
are consistent with the design of the built-in JSON.parse
/stringfy
, are easy to use, and are worth including in std.
I think the majority of use cases where we want to preserve and parse comments is to make changes to a configuration file and write them back, so I think APIs like assign
and CommentArray
should also be included.
If you decide to add these to std...
typeof
operator becomes "object"
. This can cause bugs when narrowing types in TypeScript.console.log(typeof "hello world"); //=> string
console.log(typeof new String("hello world")); //=> object
assign
and CommentArray
easier.// object is parsed like...
{
key: "value",
[Symbol.for("comments")]: {
key: {
before: {...},
after-prop: {...},
after-colon: {...},
after-value: {...},
after: {...},
},
},
};
// array is parsed like...
[
"foo",
"bar",
"baz",
[Symbol.for("comments")]: [
{
before: {...},
after-value: {...},
after: {...},
},
...
]
];
@ayame113 I am in agreement with both of your suggestions. I would also propose (crudely atm) something as follows:
export const comments_api: unique symbol = Symbol("comments api")
export interface CommentsApi {
// methods to read/modify/write JsoncValue and consistently maintain comments
// including order, before/after, maybe whitespace indentation, etc.
}
export type JsoncValue =
| string
| number
| boolean
| null
| (
(
| JsoncValue[]
| { [key: string]: JsoncValue | undefined }
) & { [comments_api]: CommentsApi }
)
For fun, I'll be prototyping an API. I'll keep an open eye for additional feedback / suggestions along the way, and for anyone else who wants to make it happen!
If I can't come up with anything more friendly than the assign
API, then it may be reasonable enough to model.
I discovered that deno fmt
also loses some JSONC comments! I made minimal reproducible examples here. For instance, compare unformatted object with formatted object. This bug is probably for a different Deno codebase; I'm not familiar enough to know which one.
In other news, I've so far been examining the types of comments that npm:comment-json
annotates. My intuition tells me that there is room to simplify the variety. The above repo snapshots my current line of thought.
@kt3k @ayame113 Is it possible and/or worthwhile to implement the core business logic in Rust as part of the deno
CLI, while somehow having Deno standard library expose a convenient Javascript API?
deno fmt
uses dprint/dprint-plugin-json for JSONC filesallow_loose_object_property_names
{ /* description of whole object */ "k0": "v0", "k1": "v1" }
{ /* description of upcoming key */ "k0": "v0", "k1": "v1" }
deno run --allow-ffi
?
Core team is not in favor of exposing jsonc related features as Deno CLI feature. I think using Rust is not an option for this issue.
Is the unicodeSets feature (v
-mode RegExp) allowed in Deno standard library? I don't see any current usages from a rudimentary code search.
At the very least, I've developed an appreciation for parsing complexity and can follow along better with the current tokenizer logic.
Since last week, I've discovered how to tokenize text input using modern Unicode-aware regexes. For example, it takes less than 400 characters of code to parse JSONC files when allowTrailingComma = false
! Short (non-production) example:
const get_jsonc_tokenizer = () =>
/([\p{ID_Continue}.+\-]+|[[\p{Pattern_Syntax}]--[\/\x22.+\-]]|[\r\n \t]+|\x22[^\x22\\]*(?:\\(?:$|[^])[^\x22\\]*)*(?:$|\x22))|\/(?:\/[^\r\n]*(?:$|\n|\r\n?)|\*[^*]*(?:$|\*+(?:$|\/|[^*\/][^*]*))+)?/gvy
function parse(text: string, { allowTrailingComma = false as const } = {}) {
return JSON.parse(text.replaceAll(get_jsonc_tokenizer(), "$1"))
}
(Caveat: When parsing fails due to SyntaxError, positions inside error message do not correspond to positions of the original text.)
And an example for tokenizing input:
function* tokenize(text: string) {
let match
for (match of text.matchAll(get_jsonc_tokenizer())) yield match[0]
const position = match ? match.index! + match[0].length : 0
if (position !== text.length) {
const token = text[position]
const error = `Unexpected token ${token} in JSONC at position ${position}`
throw new SyntaxError(error)
}
}
The regexp loosely considers a JSONC token to be the largest non-empty prefix of one of these:
[\p{ID_Continue}.+\-]+
in Unicode-aware mode (either 'u' or 'v' flag)
[[\p{Pattern_Syntax}]--[\/\x22.+\-]]
with UnicodeSet notation (only 'v' flag)
[\r\n \t]+
\x22[^\x22\\]*(?:\\(?:$|[^])[^\x22\\]*)*(?:$|\x22)
/
followed by either:
\/[^\r\n]*(?:$|\n|\r\n?)
\*[^*]*(?:$|\*+(?:$|\/|[^*\/][^*]*))+
When combined with sticky mode, matchAll
will tokenize the input and stop either if no input remains or if the next character is not vetted by the regexp.
The example regexp was designed to vet more identifier-like tokens, numbers, punctuation, and string escapes than JSON allows. Relying on JSON.parse
will further vet the illegal syntax.
The example regexp was also designed with a single capture group for all non-comment tokens, allowing replaceAll
to strip away all comments.
(Side note: Using \x22
indirectly for "
prevents VSCode syntax highlighting from looking inside v
-mode.)
To avoid comment loss by default, perhaps the below scenarios provide a sensible default mode. In order to assess for any unforeseen ambiguity/unknowns, I will try to prototype an API and implementation.
Perhaps the MVP should not care about preserving whitespace formatting beyond what is consistent with stringify
indentation, especially if it is infeasible to do so. The default priority would be to preserve relative order as well as original comment content.
If a client program cares enough about whitespace formatting, then the client program could either:
deno fmt
Example before:
{ "k.keep": { "k.old": "v.old", }, "k.delete": "v.delete" }
Example after modifications (whether done via a replace/update/rename):
{ "k.keep": { "k.new": "v.new", } }
Example before:
{"k.keep":/*keep.start*/{/*a*/"k.old"/*b*/:/*c*/"v.old"/*d*/,/*e*/},//keep.end
/*f*/"k.delete"/*g*/:/*h*/"v.delete"/*i*/,/*j*/}
Example after a kept key is updated via value replacement:
{
"k.keep":
// /*keep.start*/{/*a*/"k.old"/*b*/:/*c*/"v.old"/*d*/,/*e*/},//keep.end
{ "k.new": "v.new", },
// /*f*/"k.delete"/*g*/:/*h*/"v.delete"/*i*/,/*j*/
}
Example after a kept value is updated: individual key-value pairs are deleted, new individual key-value pairs are introduced:
{
"k.keep":/*keep.start*/{
// /*a*/"k.old"/*b*/:/*c*/"v.old"/*d*/,/*e*/
"k.new": "v.new",
},//keep.end
// /*f*/"k.delete"/*g*/:/*h*/"v.delete"/*i*/,/*j*/
}
Example after a kept value has its keys and/or values renamed (i.e. to preserve semantic placement of associated comments):
{
"k.keep":/*keep.start*/{/*a*/"k.renamed"/*b*/:/*c*/"v.renamed"/*d*/,/*e*/},//keep.end
// /*f*/"k.delete"/*g*/:/*h*/"v.delete"/*i*/,/*j*/
}
Describe the bug
esm.sh is currently unable to update an existing deno.jsonc file. See: https://github.com/esm-dev/esm.sh/issues/686. While implementing, it came to light, that the jsonc parser in the deno std lib seems to discard comments. (see this issue, comment by farsightsoftware: https://github.com/esm-dev/esm.sh/issues/687)
I've not checked it myself yet, but I asked farsightsoftware to add an example to this issue.
The main usecase for this std lib is currently updating the deno.jsonc file, so it would make it a lot easier for all tools dealing with the deno.jsonc file if it would preserve comments automatically.
Steps to Reproduce
TODO
Expected behavior
The jsonc parser output is able to be stringified again and produce identical output to the original jsonc input.
In diagram:
jsonc file
=> jsonc parser
=> some intermediate representation
=> jsonc stringifier
=> a file identical to the original jsonc file (especially with comments)
Environment
TODO