Closed kytrinyx closed 6 years ago
I have several issues with the normalization format used here.
While these changes which seem to be the result of sorting keys alphabetically might be easy enough for computers, these files need to be maintained by humans so a more human-friendly organisation format needs to be used.
A consistent cross-track format can still be achieved while maintaining the human friendly formatting but this is not it.
See comments on the changes to the Ruby track config.json: https://github.com/exercism/ruby/pull/756/files
I do have my issues as well...
[]
instead of null
, at least under some circumstances, this should be left untouched by the formatter. []
means no topics apply, null
means I haven't looked at it yet. At least thats how I read it.Both files are used, read, and modified by humans, so we should make sure they find their way through.
Also @petertseng made some good points in exercism/haskell#610, that I haven't realized when I glanced over the erlang PR, because I was to focused on the actual exercises and forgot that there is much more in the file.
I merged PRs for the tracks I help maintain, but I also agree with the suggested improvements to later normalizations above.
I also agree with the above statements. I think it would be best to make the config.json
as human-readable as possible to make it easier for contributors to understand.
Out of interest, how much difference would having the keys ordered by importance make compared to having them in the proposed alphabetical order for the planned scripted changes? If it's necessary, could it be done as a temporary change that is reverted to a more human-friendly ordering later?
Scripting changes basically means, importing the JSON into an internal representation of the script, changing stuff in memory, writing back the representation into the JSON file.
There is no normalisation necessary to do this, but since some JSON encoders do not care for pretty printing or key-order (because it is semantically irrelevant) its not the best thing to rely on the output.
Therefore on any scripted (or manually) change, the formatter is called independently.
This is the only way to enforce a canonical format, which I really had prefered to be discussed before hand, instead of creating additional noise after we had a normalisation about 4 months ago to avoid the noise that is now created.
And also in the former normalisation I do remember discussion about how to pretty print empty arrays, but there its been inline vs. three-line...
I'm realizing right now that empty lines at the end of the file are normalized away. I prefer to keep them, it's easy to miss the last line on cat if it is missing.
/cc @exercism/configlet would you please add your opinions here?
I have a few thoughts, but they're still fairly nebulous.
I'm not trying to force you all to do extra work, I'm trying to find a way to make it possible to do the work that I need to do with as little negative effect as possible.
I appreciate the constructive suggestions for how the serializer should work, with the reasoning for why.
Scripting changes basically means, importing the JSON into an internal representation of the script, changing stuff in memory, writing back the representation into the JSON file.
There is no normalisation necessary to do this
@NobbZ It sounds like you're saying that you'd prefer that I submit pull requests that make seemingly arbitrary changes along with the actual changes that I need to make. That doesn't seem like something you would say, so I'm pretty sure I'm misunderstanding. Would you mind clarifying?
It is the second round of normalisation
I expect there to be several more rounds of normalization, as we learn more and make better decisions and improve our tooling.
The first round of normalization was the result of a script that I wrote, and we didn't have a tool at that time. The second is based on a tool that all maintainers have access to.
While I absolutely understand that it would have been ideal to get all the details of normalization right the first time, that hasn't been possible. I didn't have all the information I needed to make a better decision.
Let's get the tool right, and then run normalization across the board. I don't care very hard about the details of the normalization, I do care strongly that we have tooling that lets us script changes, because in the next few months there will be many changes as we work to get v2 right.
I'm responding to a question nobody asked yet not because I think someone will ask it of me (I don't think someone will; the question is a bit too hostile), but rather because my response to the hypothetical question explains my behaviour in the past and now:
Why didn't you raise all your concerns when we were discussing in https://github.com/exercism/discussions/issues/177 ? If you didn't complain back then, you have no right to complain now.
Given my (everyone's!) limited time, at the time it made sense to only comment on issues that would affect me. I looked at the issue in question and thought "oh, formatting, don't need it, don't plan to run it on my tracks, so don't need to comment at all". I guess that ended when it is proposed to be run on the tracks.
It's already known that I defend the humans, and I don't think I contribute to the conversation by rehashing why.
Here are a few possible ways I can see proceedings from this point. All possible ways should include a possible way to achieve the objective of scripting a change to config.json:
language
first, some other things, exercises
last, etc.)
track_id
to all config files after language
, I simply look for the line that has "language": "whatever",
and add a line immediately after it that says "track_id": "whatever",
at the same level of indentation.Why didn't you raise all your concerns when we were discussing in exercism/discussions#177 ? If you didn't complain back then, you have no right to complain now.
I disagree with this hypothetical question, for what it's worth.
Also, I defend the humans, I just don't always understand them. I really want to get this right so that we can (a) minimize work for maintainers, and (b) support scripted changes. And make the humans happy.
This entire comment does not need to be taken into consideration when considering the course of action.
Looks like there will be trouble if the normaliser does not act the exact same way as whatever is doing the scripting. If we suppose that the scripting is done in Ruby:
$ ruby -rjson -e 'puts JSON.pretty_generate(JSON.parse(File.read("config.json")))' | diff -u - config.json
--- - 2017-10-24 03:15:18.000000000 +0000
+++ config.json 2017-10-24 03:13:21.000000000 +0000
@@ -786,10 +786,8 @@
"uuid": "d5997b60-e54c-4caa-beae-8614f0da3fb3"
}
],
- "foregone": [
-
- ],
+ "foregone": [],
"ignore_pattern": "example",
"language": "Haskell",
"solution_pattern": "example.*[.]hs"
-}
+}
\ No newline at end of file
This may affect the choice of technology for the normaliser/future scripts
Above argument is invalid. Just run the normaliser after running the script, and that way a minimal diff is achieved
Just run the normaliser after running the script, and that way a minimal diff is achieved
Yepp—that's what I've been doing.
FWIW, I think this all feels slightly round-peg-in-a-square-hole-ish because we're
There's no good way around # 1 and # 2. It might be better if we made each exercise's data an atomic file in source control instead of one big file, but that would really just be pushing some technical debt around with diminishing returns.
I sort of feel that # 4 is wrong of us. If we wanted order to matter, we should have used an array. If you want a better way to visualize your data, then build a tool to do so IMO. I'm new to this, but thus far I really haven't cared where the keys were at in the CFML repo. I'm not changing or even looking at most of those values on a regular basis, and it's not like I forget what track I'm in so I have to scan the JSON file to remind myself.
So # 3 is the crux of the issue IMO. The impedance mismatch being that two JSON strings with different whitespace can contain the same effective data, yet we're using a storage system that cares about whitespace. For the most part, I think we can get away with what we need to accomplish if we:
config.json
file at a time without some sort of conflicts.@NobbZ It sounds like you're saying that you'd prefer that I submit pull requests that make seemingly arbitrary changes along with the actual changes that I need to make. That doesn't seem like something you would say, so I'm pretty sure I'm misunderstanding. Would you mind clarifying?
The paragraph you quote is as explanation to @N-Parsons and just meant as an explanation about the workflow with the normalizer and scripted changes.
Of course I do expect the scripted change and the re-normalisation to happen in a single commit.
I expect there to be several more rounds of normalization, as we learn more and make better decisions and improve our tooling.
Ah, Okay, I can live with that, but it would have been nice to tell that from the very beginning.
Slightly abusing the unordered nature of JSON objects by wishing they had order for our convenience
As a maintainer I have to alter the file by hand if things change, eg. adding a new topic to an exercise or changing its difficulty. Therefore its necessary to be able to find those entries quickly. So when I do a text search for accumulate
, because thats the exercise I want to change, then after finding it, I do only want to scan the lines below to find the key I want to alter. Having to scan in both directions makes it even harder, especially given the chance that the potentially huge list of topics could be between the exercises name and the key I'm actually searching for.
If we had a graphical tooling[1] though, which were guaranteed to work on Mac, Win and Linux at least that takes care of this and I could alter entries as I'm used to it with any CRUD-tool, I were fine with any ordering.
only one pull can cleanly touch the config.json file at a time
Personally this is a no-go for me. If scripted changes come in fast, and I'm burried under day jobs work, I were probably unable to handle those quick enough, especially since solving those involves not only me to merge, but also the author of the PR to rebase. Since those PRs will probably done from in timezones raching from UTC-4 to -9, while I do live in UTC+1/+2 (depending on DST). This difference of 5 to 12 hours results often in roundtrips of a complete day.
[1] Some go program which serves a small web app is totally fine for me.
I like the idea of cross-track normalization and configlet fmt
should be the authority on it.
In Javascript we can't guarantee ordering in JSON so I have become used to not caring much about ordering of keys. Because configlet is in Go we can actually enforce an output key order (yay Go :smile:) so we can actually consider it.
No strong opinions on output order except it should have some logic to it. Alphabetical is always a safe default, but as has been pointed out here, because config.json
is hand-edited it may make sense to group keys logically to make that process easier. The Go track's is in alphabetical order and I can't say I even noticed much.
Targeted PRs make sense to me: "single purpose changes". nextercism
is a rolling target, it's to be expected that config.json
will have to adapt more than once.
I prefer to write the empty topics as [] instead of null, at least under some circumstances, this should be left untouched by the formatter. [] means no topics apply, null means I haven't looked at it yet. At least thats how I read it.
I can see the logic behind this thinking, but it does give us two different representations of "no topics": "no topics because there are none" and "no topics because none have been assigned yet", not sure if we need that distinction. Although if there is an exercise with no applicable topics perhaps we need to expand the topics list or remove the exercise? Even "hello world" has applicable topics.
(I guess technically "I have not defined it yet" would be undefined
in JSON while null
would be "I am not defining it"? JavaScript! shine on you crazy :large_blue_diamond:)
Sometimes the programmer in me sees a debate like this and thinks "Can we make everyone happy?" add a switch to have fmt
output config as alphabetical or maintainer(ical?). But, hopefully we can agree that it is probably better to figure out a canonical ordering and stick to it. Otherwise I fear we would just be kicking the can down the road to deal with later, when configs are larger and more complex.
This is a good discussion. Glad we are having it. Nextercism will come out better as a result!
I'm sort of jumping into the middle of this discussion since @kytrinyx requested some input from me, so if I'm saying something silly due to context that I'm not aware of, please tell me.
@NobbZ:
Of course I do expect the scripted change and the re-normalisation to happen in a single commit.
I would very much want two separate commits (possibly in one pull request). Although it's not a hard and fast rule, when code is reformatted without changing content, it's nice for that reformatting to be in a commit of its own, separate from any semantic changes.
I am, in general, extremely skeptical of automatic code formatters (except where they're really a core part of a language ecosystem, as I suppose gofmt is), and I have no particular interest in cross-track normalization or stylistic consistency for its own sake. In the config files that I've written, I use the order and style that I personally find most readable, and I'd get pretty cranky if someone told me I had to switch to a style that I considered less readable. Likewise, I'll extend other people that same courtesy for keeping their preferred style. I don't think anything is gained by forcing everyone into the same Procrustean bed. :)
So I guess what I'm saying is that I don't especially want to enforce a canonical order, nor do I see much use for something like configlet fmt
except as a linter.
I also strongly agree with most of @bdw429s's comments: it does seem like we are (slightly) abusing JSON and generally overthinking the concept of normalization.
Its seems like there are a number of issues being discussed in this thread pertaining to configuration file normalization. All of which are valid, but before we start to pick things apart I think it is best to start with a little context behind configlet fmt
.
configlet
reads in the raw JSON file (it only cares that it is valid JSON) and spits out a formatted JSON file. This was done to reduce the coupling between configlet
and the two configuration files config.json
and maintainers.json
. With that said, I'm with @kytrinyx
I really want to get this right so that we can (a) minimize work for maintainers, and (b) support scripted changes. And make the humans happy.
Having a single source of truth for configuration file normalization is key. Achieving this can be simple or tedious, IMO.
configlet
so that it outputs an agreed JSON file format that is easy for maintainers to read. The trade off being that we would need to update configlet
each time a new field is added, and possibly add some logic to ensure unknown data is not lost while Configlet is being updated. fmt
command (e.g configlet fmt . --normalize-topics-only
). Supporting alphabetical and track-specific ordered configs might not be all that simple 🤔 configlet fmt
to call out to some other track specific tool for formatting. From our perspective we would just call configlet fmt .
and rely on the track's tool for normalizing the files as the track maintainers see fit. This is a bit complex, but it has the most flexibility and should make everyone happy 🤞 For the record, I do agree with @tleen @bdw429s when it comes to the order, but I do understand how this can introduce more work for maintainers working with the config files manually.
Which brings me @NobbZ's statement about a CRUD like graphical tool. What if configlet
had support for modifying config data via some CRUD like set of sub-commands (i.e configlet exercise [add|update|delete]
)? Is that something people would be interested in?
If we had a graphical tooling[1] though, which were guaranteed to work on Mac, Win and Linux at least that takes care of this and I could alter entries as I'm used to it with any CRUD-tool, I were fine with any ordering.
Regarding PRs
Targeted PRs make sense to me: "single purpose changes". nextercism is a rolling target, it's to be expected that config.json will have to adapt more than once.
👍
What if configlet had support for modifying config data via some CRUD like set of sub-commands (i.e configlet exercise [add|update|delete])? Is that something people would be interested in?
For what it's worth, I've all but removed the need for track contributors to even need configlet
in the CFML track. We've already written CFML-based CLI tools that scaffold exercises (which includes the additions to the config.json
file automatically), generate readme files, and normalize the config.json
to the same specs that configlet fmt
uses. Part of this was just to reduce the amount of tools needed to a bare minimum for people to contribute and because I am a man with a CFML hammer and the world is my nail... :hammer: :smile:
@bdw429s good show - I have to take a look at what you've done as I can see value in providing such tools.
@nywilken:
It's original intent behind [
configlet fmt
's] creation was not to enforce a stand order, but to normalize the JSON file in a consistent manner for automated scripting purposes.
Why would that need normalization, unless our scripts are acting directly on the JSON as text? (And of course, that's not how it should be done in general: JSON should be parsed into appropriate data structures by the script, and then all manipulation should happen on these data structures, serializing them back to JSON only after that's happened.)
We needed a way to enforce topic names programmatically across tracks #60
I'll look at that issue...but surely if we need to reinforce that, then plain JSON is the wrong tool, no? (Although I hate to suggest it, perhaps we need a monorepo so everything can refer to the same files?)
What if
configlet
had support for modifying config data via some CRUD like set of sub-commands
That might be interesting (like git config
?), but only if it has a decent UI. Otherwise people will go back to editing the config files directly, and we're right back at square one.
@bdw429s:
I've all but removed the need for track contributors to even need
configlet
in the CFML track. We've already written CFML-based CLI tools
My hat is off to you from a programming point of view, but from the point of view of Exercism maintenance, this worries me a bit. If we have standard tools meant to be used for all tracks, then we should probably not reimplement them, or else they'll surely go out of sync.
I am a man with a CFML hammer and the world is my nail
I'm a former ColdFusion developer myself. Friendly advice from that perspective: get yourself some other hammers as well. :)
@marnen:
I would very much want two separate commits (possibly in one pull request).
Nope, since this would defend the sole purpose of normalisation: easier to read diffs.
serializing them back to JSON only after that's happened.
And this is where anarchy begins if you do not normalize the JSON before hand… Since every tool, every JSON serializer might behave differently.
@bdw429s:
[beeing proud of reimplementing configlet in the tracks language]
Why should I put time into recreating a tool that is already there and that is cross platform. This tool is even available precompiled for MacOS X, Linux 32 and 64 bit as well as ARMv5 and v6, Windows 32 and 64 Bit…
It is well tested by the maintainers of all - 1 tracks, known to work as expected, and when specs change, its the first tool who knows, since this tool actually is the spec…
The Not-Invented-Here-Syndrom you suffer from though, is what I've often seen to ruin projects, because they got to expensive after months of developer time were spent to rebuild something that was available for only 1k USD/year, thats less than a week of a developers time costs, not even to speak about his desk, PC, powerusage etc…
But we are not here to discuss NIHS. Feel free to stick with your tool, but please do not complain when later on there will be the scripted PRs which will cause a lot of noise on your diffs or when your track can't get processed or served properly because you didn't pick up a small change in the specs of those files fast enough.
@marnen
Why would that need normalization, unless our scripts are acting directly on the JSON as text?
There was a concern that programmatically modifying config files would result in inconsistencies when it came to idententaion, since config files are modified differently across tracks. So the fmt command was built to normalize “format” JSON files (via JSON serialization and deserialization in Go) to ensure that files followed a standard level of indentation and were in a reproducible structure. The fmt command is idempotent, and should always result in the same file structure with little to no changes specific to indentation; making scripted PRs for config files easier to consume and free of major white space changes. At the current moment, the use of configlet fmt
is being ran for the first time on some tracks so there are a number of white space changes in the PRs referencing this issue. But that should not be the case for future PRs.
I hope that answers your first question.
@nywilken:
Which brings me @NobbZ's statement about a CRUD like graphical tool. What if configlet had support for modifying config data via some CRUD like set of sub-commands (i.e configlet exercise [add|update|delete])? Is that something people would be interested in?
Personally I'm fine with such an approach, I do often prefer CLI over GUI. But we have to ask the other maintainers as well. Especially in the more Windows and Mac oriented languages, I think there will be demand for something more GUI like than an interactive CLI tool.
@NobbZ:
Nope, since this would defend the sole purpose of normalisation: easier to read diffs.
I can't make this make sense, which probably means I'm not understanding you correctly. Can you explain what you meant?
And this is where anarchy begins if you do not normalize the JSON before hand…
Quite the contrary. The normalization (if any) happens on serialization, so any normalization done on input is completely irrelevant.
Since every tool, every JSON serializer might behave differently.
I have 2 responses:
configlet
. Am I misunderstanding?@marnen:
Nope, since this would defend the sole purpose of normalisation: easier to read diffs.
I can't make this make sense, which probably means I'm not understanding you correctly. Can you explain what you meant?
As long as we have a normalized base and re-normalize after each change, resulting diffs will be kept clean. There will be no jumping keys because of different or even random ordering of keys in an object on serialisation.
And this is where anarchy begins if you do not normalize the JSON before hand…
Quite the contrary. The normalization (if any) happens on serialization, so any normalization done on input is completely irrelevant.
Well, you need to normalize it once to have a clean base, make tabula rase.
Of course you can decide to not accept those first clean up and do your thing, but the first scripted change including normalisation will come now or later.
There's only one tool in question AFAIK, that being configlet. Am I misunderstanding?
The scripted changes can be done by tools written in any language by using any JSON library. Therefore configlet fmt
should be run afterwards to retain normalized format.
Even if there were multiple ones, and even if they for some reason normalized their serialized output differently, I wouldn't care much as long as the semantics were the same.
But I do.
When a scripted change comes in, which is purpose is to multiply all difficulties by 10 to scale up to the new range from 0 to 100 instead of 0 to 10, and this change does also swap the keys of the object around, I am not able to properly review those changes because the diff gets to noisy.
Therefore we normalize, to be able to keep small focusable diffs.
I do not trust foreign changes, I do not trust foreign scripts, I do not even trust my own, this is why I like to get reviewed, this is why I really miss a second pair of eyes for the erlang track…
@NobbZ:
When a scripted change comes in, which is purpose is to multiply all difficulties by 10 to scale up to the new range from 0 to 100 instead of 0 to 10, and this change does also swap the keys of the object around, I am not able to properly review those changes because the diff gets to noisy.
Right, if formatting and semantic changes are mixed in one commit, then that commit's diff is less clear. This is why, if that happens, I was recommending above that it be done in two commits.
Therefore we normalize, to be able to keep small focusable diffs.
Of course diffs should be small and focused. But I believe that the way to get them that way is to use the VCS to make small focused commits. Changing the source code itself solely in order to make the diffs smaller is putting the cart before the horse and making a fetish out of the tools.
IMHO the source code should take a form that makes it readable, not one that makes the diffs so. Likewise, the diffs should take a form that makes them more readable, not the source code.
@NobbZ (Also just saw your comment about Erlang. I'm not exactly an expert, but I do have a clue about Erlang and I can take a look...)
Right, if formatting and semantic changes are mixed in one commit, then that commit's diff is less clear. This is why, if that happens, I was recommending above that it be done in two commits.
So you suggest to start from a normalized base. Run scripted changes which do change data and because the serializer does random order of keys each time, you get a huge diff. You do a first commit. After that you renormalize resulting in another huge diff.
This results in 2 commits having huge diffs in their own but changing only 2 lines net.
No, that's absolutely not what I'm suggesting. I'm suggesting that if normalization needs to happen as part of a script, it should be in its own commit.
This results in 2 commits having huge diffs in their own but changing only 2 lines net.
And if that were the case, then it would be easier to review the branch diff than each commit's diff individually. That's fine. The reverse is fine too. Git works for us, not the other way around.
...oh, and I'm also suggesting that the script doesn't need to start from normalized input (or indeed to create normalized output, though it will probably do so since serializers tend to behave predictably).
...thinking this over some more. I have no particular desire for every script to arbitrarily change the order of JSON properties, but I also don't want to have that order mandated for the config files themselves. Preserve input order on output, perhaps?
Which you cant. Let's assume javascript. It will randomized keys on each serialisation even if no objects have actually changed.
Marnen Laibow-Koser notifications@github.com schrieb am Mi., 25. Okt. 2017, 01:02:
...thinking this over some more. I have no particular desire for every script to arbitrarily change the order of JSON properties, but I also don't want to have that order mandated for the config files themselves. Preserve input order on output, perhaps?
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/exercism/meta/issues/95#issuecomment-339162205, or mute the thread https://github.com/notifications/unsubscribe-auth/AADmR2G3VZHangmh40T8gfBT5Z0nWCgyks5svmyQgaJpZM4QCP4O .
@NobbZ You actually could (by reading key order when parsing, and using toJSON
to customize the behavior of JSON.stringify
), but I'm not sure it would be worth it. :) (I suspect Ruby would do better on this, because hash keys are ordered. No idea about any other language's JSON library, because I rarely care.)
Anyway, I've probably gone on way too long here, and for that I apologize. I wouldn't be absolutely opposed to normalization if we could guarantee a nice human-readable order of parameters as @petertseng suggested above, although I do still sort of think that any approach that cares about the serialization order of an unordered structure is kind of doing it wrong...
from the point of view of Exercism maintenance, this worries me a bit.
@marnen I understand and have thought through the same thing. It's been really a trivial amount of work, and to be honest, it's work for me, not for anyone else, and I'm ok with that.
get yourself some other hammers as well.
@marnen Heh, yes-- that was being a little tongue in cheek on my part. Trust me, I have many hammers in my toolbelt. CFML is personal to me however, and I have a secondary goal in this Excercism track of using it to showcase just how modern the language is. Heck, most people out there don't even realize you can do CLI scripting in CFML these days! This has been in part an experiment in what I can accomplish for the sake of learning. I'm well aware that the onus will be on me to keep any of these CLI scripts up to date, but they've all been very trivial to date thus the nice seamless tie in to the existing CLI exercise generation is worth it IMO.
Why should I put time into recreating a tool that is already there
@NobbZ It sounds like you shouldn't. My reasoning for doing so is above :)
since this tool actually is the spec
@NobbZ I would disagree. The spec is the spec. Configlet is the official implementation of that spec. As a track maintainer, I'm ok with taking on the responsibility of creating an end-to-end set of CLI scripts which not only follow the spec, but also do much more than configlet does. I'm scripting complete exercise scaffolding, unit test generation, and JSON manipulation. The fact that I tossed in the readme generation and JSON normalization to the package is just a side note.
The Not-Invented-Here-Syndrom you suffer from though, is what I've often seen to ruin projects, because they got to expensive after months of developer time were spent to rebuild something that was available
@NobbZ I appreciate your concern, and in most business scenarios I would likely agree. Let's be realistic though, I spent about 5-10 minutes hammering out a nice CFML-based JSON formatter that's all of 13 lines of code. I'm not running a for-profit here, this is a hobby and one that I specifically want to showcase modern CFML that fits nicely into the existing toolchain a CFML developer will have installed.
but please do not complain
@NobbZ I assure you, I won't be complaining I'm well aware of what I'm doing. :smile:
later on there will be the scripted PRs which will cause a lot of noise on your diffs or when your track can't get processed or served properly because you didn't pick up a small change in the specs of those files fast enough.
@NobbZ Actually, no, that won't happen. If the spec (and therefore configlet fmt
) is changed, there will be another round of pulls to update the format of all the configs (much like this one), at which point I'll simply update my tooling to match. If it ever becomes a burden, I can easily drop my version and simply switch to running configlet.
Back to the discussion, I think most people are more in agreement than not... well maybe, lol. First and foremost, for this to work at all, I think we all need to agree to whatever formatting/normalization tool we're using, it needs to be run every time any change is made to the JSON, and in my opinion, it should be part of the same commit. Assuming that the JSON file is proper to begin with, that means that every diff will only show the actual changes that were made to the JSON, which is what we're going for.
I'm also assuming that any automatic formatting/normalization should never re-order the JSON arbitrarily. That means that if a track owner wants his/her keys in a specific order, that will never be changed.
The only remaining loose end is if a track owner wants to automate a change to their JSON using a language/parser that doesn't preserve order of an object's keys AND that same track owner doesn't want their keys changed. That will be on them to deal with. So long as any sweeping "changes from above" are made via a lang like Go that happens to preserve the order, there shouldn't be any issues as long as the JSON is normalized before and after the change.
So, in short, I'm proposing that the spec for the JSON doesn't include specifics on the order of keys, but only the whitespace like indentation, spacing, and line break characters. Let the track owners decide if they want to order keys, etc and the formatter won't touch them.
I lot of things I'm for I have been mentioned. Though to be honest, I haven't given the config and maintainer files much thought.
I definitely agree that it needs to be as human readable as possible. So that from a glance you can get the information you want.
like Go that happens to preserve the order
It doesn't. If the serializer is based on a struct, it will emit the keys in the same order as they are specified in the go sourcecode of the struct, if though the serializer is based on map[string]interface{}
, keys will be randomized. If you have a []map[string]interface{}
(array of objects) order of keys will even differ between each object.
What if configlet had support for modifying config data via some CRUD like set of sub-commands (i.e
configlet exercise [add|update|delete]
)? Is that something people would be interested in?
I don't think I'd be interested; I think I'd just do it via a text editor. I think the reason is just that I don't want to have to learn these new commands, whereas I already know how to use my text editor. Seems like too much work (both for those who would implement those commands and those who would learn them) for too little value.
I also don't want to be in a world where such commands are necessary (if, for example, config was a binary format that wasn't human-readable at all), but I don't think anyone was suggesting a move to that world
Something I wonder about.
The body of this issue does a good job of explaining "Why?"
By normalizing the config files, we will reduce the amount of noise in those PRs.
The title of this issue answers the question "What?"
Run configlet fmt on all tracks
I wonder whether the title is directing the discussion too much toward the merits of one specific "What?" and not enough toward "How do we achieve the goal we are trying to drive toward?". Generally I think we are staying on the right track; I hope it stays that way.
Maybe the way to achieve the goal isn't necessarily via using fmt
on all config files, I don't know. I don't have any other suggestion right now, but I wouldn't want to prevent anyone from thinking about it.
I'm proposing that the spec for the JSON doesn't include specifics on the order of keys, but only the whitespace like indentation, spacing, and line break characters. Let the track owners decide if they want to order keys, etc and the formatter won't touch them.
This proposal is one of two that would currently be acceptable to me (the other possibility being to do the same, but also specify key ordering).
Let's hear it if there are any predictions (other than to add track_id) for what changes will be necessary in the future.
Let me list those that have happened in the past:
ignored
directory list on the top level: https://github.com/exercism/meta/pull/7repository
key on the top level, remove a leading x: https://github.com/exercism/meta/issues/1uuid
, core
, unlocked_by
to each exercise, since they are needed by Nextercism: https://github.com/exercism/meta/issues/16slug
and repository
on the top level, as they are no longer necessary: https://github.com/exercism/meta/issues/19Maybe the way to achieve the goal isn't necessarily via using
fmt
on all config files
I honestly was so tempted to suggest using text transformations (add this key one line above this other key, with the same level of indentation), but there is too much madness here:
Well, it was worth a try.
How about I ask this question instead. I hope this isn't considered a pointless exercise; it is to help understand the problems we're trying to solve.
Explain why it is preferable for one person to personally script these changes and submit them as PRs to all tracks. Compare it to alternatives such as filing an issue against each track and letting the track maintainers do it.
Well, about those PRs that added uuid
, core
, unlocked_by
to each exercise, it would have been a huge duplication of effort to make each track maintainer script that change individually. It makes sense to simply write the code once and do the track maintainers the favour, otherwise it might never get done! Wouldn't want to have tracks that procrastinate too much and hold back Nextercism.
I guess I understand. Now to see how to make that work.
It doesn't.
@NobbZ Ah, crap. Sorry then I misunderstood. It sounds like it would be necessary to rewrite configlet to not parse the JSON, but simply format it's whitespace as a pure string manipulation. This is exactly what the CFML lib does that I'm using. It never deserializes the JSON into native data types, but simply reads the string, iterates over each character, and applies the whitespace rules as defined (indentation, spacing, line breaks, etc). In that way, it will never modify the order of the data, it will only format the white space around it.
My first preference would be to not mess with the track's formatting at all.
Change to an existing key: Change in-place. Delete an existing key: delete in-place, move everything after up/left. Add a new key: Put it at the end.
@Insti Can you name a tool that does this, preserving all order and indentation and stuff?
@NobbZ @Insti Well, for what it's worth, I just wrote one in less than an hour. Python's json library maintains key order. This script will even auto-detect existing key order for adding new exercises.
Can you name a tool that does this, preserving all order and indentation and stuff?
No, but: a) It wouldn't surprise me if several already existed. b) It would be easy to write one. (see above)
We are paying the price for using JSON for configuration files. It seems our options are:
It may be time to start looking towards consensus in this issue. Here is what I am reading:
JSON ain't great
No it is not, and its limitations are well known: can't comment in it, and JavaScript serializers could care less about key order. However, JSON has more-or-less become the default format for configurations across the board (not just in Exercism), so there is no real push to change config.json
to config.?.
So how to deal with the JSON?
config.json is hand-edited
For the most part maintainers manually edit the config.json
file. This file is becoming quite big in some tracks. Track maintainers have ways they like to edit the config.json
file and scripted updates can be disruptive to that process.
Due to the hand editing, anything that makes it easier for human readability is a Good Thing.
Property order matters
Due to the point above, the ordering of the properties for the exercises in config.json
matters to most. For those it does not matter, there is no strong preference beyond it being logically consistent and common across tracks. I don't see anyone advocating for alphabetical ordering or anything else, but when there is a preference it is universally for something that is easy for humans to read and understand in the exercise context, i.e. exercise slug on top.
First attempt at a conclusion: seems like we are overall for defining some ordering/grouping on the properties of the exercises in the config file and have any tooling default to using that ordering. Anybody strongly against this?
Just curious:
JSON has more-or-less become the default format for configurations across the board (not just in Exercism)
Got any corroborating sources for this information? I feel curious about how projects do configuration. I tried to ask my preferred search engine about "most popular configuration file format" but all I got was http://www.terminally-incoherent.com/blog/2013/08/28/what-is-your-favorite-congfig-file-format (a poll which may not be representative) and posts advising not to use JSON (these posts may have been necessary because JSON is the most popular, but that's not explicitly stated).
These are my preferences (and thus, I gave 👍 to the above proposal):
In the coming months we will be submitting scripted pull requests to the tracks with various nextercism-related changes to the config files.
By normalizing the config files, we will reduce the amount of noise in those PRs. See documentation about configlet here: https://github.com/exercism/docs/blob/master/language-tracks/configuration/configlet.md