exercism / meta

Experimenting with a repo to manage the project-wide, meta todos.
7 stars 2 forks source link

Run configlet fmt on all tracks #95

Closed kytrinyx closed 6 years ago

kytrinyx commented 7 years ago

In the coming months we will be submitting scripted pull requests to the tracks with various nextercism-related changes to the config files.

By normalizing the config files, we will reduce the amount of noise in those PRs. See documentation about configlet here: https://github.com/exercism/docs/blob/master/language-tracks/configuration/configlet.md

Insti commented 7 years ago

I have several issues with the normalization format used here.

Insti commented 7 years ago

While these changes which seem to be the result of sorting keys alphabetically might be easy enough for computers, these files need to be maintained by humans so a more human-friendly organisation format needs to be used.

A consistent cross-track format can still be achieved while maintaining the human friendly formatting but this is not it.

See comments on the changes to the Ruby track config.json: https://github.com/exercism/ruby/pull/756/files

NobbZ commented 7 years ago

I do have my issues as well...

  1. It is the second round of normalisation
  2. UUID is the last attribute of each object, it should be first, since it identifies each exercise, after that we should have the name/slug of the exercise since it identifies the exercise for the human. All the other metadata should follow, with the topics list beeing the last, since it might clutter the object if it is anywere else and has many entries.
  3. I prefer to write the empty topics as [] instead of null, at least under some circumstances, this should be left untouched by the formatter. [] means no topics apply, null means I haven't looked at it yet. At least thats how I read it.
  4. In the maintainers file, github username and given name should be more on top of the object, instead of midway through. Then alumnus and show on website, then the other stuff and maybe the bio last as it is the longest value.

Both files are used, read, and modified by humans, so we should make sure they find their way through.

NobbZ commented 7 years ago

Also @petertseng made some good points in exercism/haskell#610, that I haven't realized when I glanced over the erlang PR, because I was to focused on the actual exercises and forgot that there is much more in the file.

stkent commented 7 years ago

I merged PRs for the tracks I help maintain, but I also agree with the suggested improvements to later normalizations above.

N-Parsons commented 7 years ago

I also agree with the above statements. I think it would be best to make the config.json as human-readable as possible to make it easier for contributors to understand.

Out of interest, how much difference would having the keys ordered by importance make compared to having them in the proposed alphabetical order for the planned scripted changes? If it's necessary, could it be done as a temporary change that is reverted to a more human-friendly ordering later?

NobbZ commented 7 years ago

Scripting changes basically means, importing the JSON into an internal representation of the script, changing stuff in memory, writing back the representation into the JSON file.

There is no normalisation necessary to do this, but since some JSON encoders do not care for pretty printing or key-order (because it is semantically irrelevant) its not the best thing to rely on the output.

Therefore on any scripted (or manually) change, the formatter is called independently.

This is the only way to enforce a canonical format, which I really had prefered to be discussed before hand, instead of creating additional noise after we had a normalisation about 4 months ago to avoid the noise that is now created.

And also in the former normalisation I do remember discussion about how to pretty print empty arrays, but there its been inline vs. three-line...

NobbZ commented 7 years ago

I'm realizing right now that empty lines at the end of the file are normalized away. I prefer to keep them, it's easy to miss the last line on cat if it is missing.

kytrinyx commented 7 years ago

/cc @exercism/configlet would you please add your opinions here?

kytrinyx commented 7 years ago

I have a few thoughts, but they're still fairly nebulous.

I'm not trying to force you all to do extra work, I'm trying to find a way to make it possible to do the work that I need to do with as little negative effect as possible.

I appreciate the constructive suggestions for how the serializer should work, with the reasoning for why.

Scripting changes basically means, importing the JSON into an internal representation of the script, changing stuff in memory, writing back the representation into the JSON file.

There is no normalisation necessary to do this

@NobbZ It sounds like you're saying that you'd prefer that I submit pull requests that make seemingly arbitrary changes along with the actual changes that I need to make. That doesn't seem like something you would say, so I'm pretty sure I'm misunderstanding. Would you mind clarifying?

It is the second round of normalisation

I expect there to be several more rounds of normalization, as we learn more and make better decisions and improve our tooling.

The first round of normalization was the result of a script that I wrote, and we didn't have a tool at that time. The second is based on a tool that all maintainers have access to.

While I absolutely understand that it would have been ideal to get all the details of normalization right the first time, that hasn't been possible. I didn't have all the information I needed to make a better decision.

Let's get the tool right, and then run normalization across the board. I don't care very hard about the details of the normalization, I do care strongly that we have tooling that lets us script changes, because in the next few months there will be many changes as we work to get v2 right.

petertseng commented 7 years ago

I'm responding to a question nobody asked yet not because I think someone will ask it of me (I don't think someone will; the question is a bit too hostile), but rather because my response to the hypothetical question explains my behaviour in the past and now:

Why didn't you raise all your concerns when we were discussing in https://github.com/exercism/discussions/issues/177 ? If you didn't complain back then, you have no right to complain now.

Given my (everyone's!) limited time, at the time it made sense to only comment on issues that would affect me. I looked at the issue in question and thought "oh, formatting, don't need it, don't plan to run it on my tracks, so don't need to comment at all". I guess that ended when it is proposed to be run on the tracks.

It's already known that I defend the humans, and I don't think I contribute to the conversation by rehashing why.

Here are a few possible ways I can see proceedings from this point. All possible ways should include a possible way to achieve the objective of scripting a change to config.json:

  1. We get agreement on an ordering of keys that would be acceptable (on the top level, language first, some other things, exercises last, etc.)
    • The normaliser could emit the keys in that order rather than alphabetised. For example, if you suppose it is written in Go, https://play.golang.org/p/pJ0hFHfVUq.
    • I understand that that necessitates a new release whenever we change the set of expected keys. If that doesn't jive well with configlet's release model, perhaps consider decoupling.
    • To script a change, add a new key to the struct, and then the marshaller will put it in the right place.
  2. We decide that the normaliser will only normalise indentation and will leave all key orderings intact.
    • Now, since indentation is normalised and I therefore know that all keys on the top-level object will have an indentation prefix of (say) 2 spaces, if I wish to add, say, a track_id to all config files after language, I simply look for the line that has "language": "whatever", and add a line immediately after it that says "track_id": "whatever", at the same level of indentation.
    • But this gets tricky if the resulting line should or should not have a comma, and/or the previous line now needs a trailing comma added. I understand it's a fragile solution, but it is one that is in reach if we do not reach agreement on an ordering.
  3. Your suggestion here.
kytrinyx commented 7 years ago

Why didn't you raise all your concerns when we were discussing in exercism/discussions#177 ? If you didn't complain back then, you have no right to complain now.

I disagree with this hypothetical question, for what it's worth.

Also, I defend the humans, I just don't always understand them. I really want to get this right so that we can (a) minimize work for maintainers, and (b) support scripted changes. And make the humans happy.

petertseng commented 7 years ago

This entire comment does not need to be taken into consideration when considering the course of action.

Looks like there will be trouble if the normaliser does not act the exact same way as whatever is doing the scripting. If we suppose that the scripting is done in Ruby:

$ ruby -rjson -e 'puts JSON.pretty_generate(JSON.parse(File.read("config.json")))' | diff -u - config.json
--- -   2017-10-24 03:15:18.000000000 +0000
+++ config.json 2017-10-24 03:13:21.000000000 +0000
@@ -786,10 +786,8 @@
       "uuid": "d5997b60-e54c-4caa-beae-8614f0da3fb3"
     }
   ],
-  "foregone": [
-
-  ],
+  "foregone": [],
   "ignore_pattern": "example",
   "language": "Haskell",
   "solution_pattern": "example.*[.]hs"
-}
+}
\ No newline at end of file

This may affect the choice of technology for the normaliser/future scripts

Above argument is invalid. Just run the normaliser after running the script, and that way a minimal diff is achieved

kytrinyx commented 7 years ago

Just run the normaliser after running the script, and that way a minimal diff is achieved

Yepp—that's what I've been doing.

bdw429s commented 7 years ago

FWIW, I think this all feels slightly round-peg-in-a-square-hole-ish because we're

  1. Slightly abusing a source control management system to act as a database mangement system
  2. Slightly abusing textual diff tools to act as complex data comparison tools
  3. Slightly abusing a data format in which extracurricular whitespace has no meaning to make it have meaning
  4. Slightly abusing the unordered nature of JSON objects by wishing they had order for our convenience

There's no good way around # 1 and # 2. It might be better if we made each exercise's data an atomic file in source control instead of one big file, but that would really just be pushing some technical debt around with diminishing returns.

I sort of feel that # 4 is wrong of us. If we wanted order to matter, we should have used an array. If you want a better way to visualize your data, then build a tool to do so IMO. I'm new to this, but thus far I really haven't cared where the keys were at in the CFML repo. I'm not changing or even looking at most of those values on a regular basis, and it's not like I forget what track I'm in so I have to scan the JSON file to remind myself.

So # 3 is the crux of the issue IMO. The impedance mismatch being that two JSON strings with different whitespace can contain the same effective data, yet we're using a storage system that cares about whitespace. For the most part, I think we can get away with what we need to accomplish if we:

NobbZ commented 7 years ago

@NobbZ It sounds like you're saying that you'd prefer that I submit pull requests that make seemingly arbitrary changes along with the actual changes that I need to make. That doesn't seem like something you would say, so I'm pretty sure I'm misunderstanding. Would you mind clarifying?

The paragraph you quote is as explanation to @N-Parsons and just meant as an explanation about the workflow with the normalizer and scripted changes.

Of course I do expect the scripted change and the re-normalisation to happen in a single commit.


I expect there to be several more rounds of normalization, as we learn more and make better decisions and improve our tooling.

Ah, Okay, I can live with that, but it would have been nice to tell that from the very beginning.


Slightly abusing the unordered nature of JSON objects by wishing they had order for our convenience

As a maintainer I have to alter the file by hand if things change, eg. adding a new topic to an exercise or changing its difficulty. Therefore its necessary to be able to find those entries quickly. So when I do a text search for accumulate, because thats the exercise I want to change, then after finding it, I do only want to scan the lines below to find the key I want to alter. Having to scan in both directions makes it even harder, especially given the chance that the potentially huge list of topics could be between the exercises name and the key I'm actually searching for.

If we had a graphical tooling[1] though, which were guaranteed to work on Mac, Win and Linux at least that takes care of this and I could alter entries as I'm used to it with any CRUD-tool, I were fine with any ordering.


only one pull can cleanly touch the config.json file at a time

Personally this is a no-go for me. If scripted changes come in fast, and I'm burried under day jobs work, I were probably unable to handle those quick enough, especially since solving those involves not only me to merge, but also the author of the PR to rebase. Since those PRs will probably done from in timezones raching from UTC-4 to -9, while I do live in UTC+1/+2 (depending on DST). This difference of 5 to 12 hours results often in roundtrips of a complete day.


[1] Some go program which serves a small web app is totally fine for me.

tleen commented 7 years ago

I prefer to write the empty topics as [] instead of null, at least under some circumstances, this should be left untouched by the formatter. [] means no topics apply, null means I haven't looked at it yet. At least thats how I read it.

I can see the logic behind this thinking, but it does give us two different representations of "no topics": "no topics because there are none" and "no topics because none have been assigned yet", not sure if we need that distinction. Although if there is an exercise with no applicable topics perhaps we need to expand the topics list or remove the exercise? Even "hello world" has applicable topics.

(I guess technically "I have not defined it yet" would be undefined in JSON while null would be "I am not defining it"? JavaScript! shine on you crazy :large_blue_diamond:)

Sometimes the programmer in me sees a debate like this and thinks "Can we make everyone happy?" add a switch to have fmt output config as alphabetical or maintainer(ical?). But, hopefully we can agree that it is probably better to figure out a canonical ordering and stick to it. Otherwise I fear we would just be kicking the can down the road to deal with later, when configs are larger and more complex.

This is a good discussion. Glad we are having it. Nextercism will come out better as a result!

marnen commented 7 years ago

I'm sort of jumping into the middle of this discussion since @kytrinyx requested some input from me, so if I'm saying something silly due to context that I'm not aware of, please tell me.

@NobbZ:

Of course I do expect the scripted change and the re-normalisation to happen in a single commit.

I would very much want two separate commits (possibly in one pull request). Although it's not a hard and fast rule, when code is reformatted without changing content, it's nice for that reformatting to be in a commit of its own, separate from any semantic changes.


I am, in general, extremely skeptical of automatic code formatters (except where they're really a core part of a language ecosystem, as I suppose gofmt is), and I have no particular interest in cross-track normalization or stylistic consistency for its own sake. In the config files that I've written, I use the order and style that I personally find most readable, and I'd get pretty cranky if someone told me I had to switch to a style that I considered less readable. Likewise, I'll extend other people that same courtesy for keeping their preferred style. I don't think anything is gained by forcing everyone into the same Procrustean bed. :)

So I guess what I'm saying is that I don't especially want to enforce a canonical order, nor do I see much use for something like configlet fmt except as a linter.

I also strongly agree with most of @bdw429s's comments: it does seem like we are (slightly) abusing JSON and generally overthinking the concept of normalization.

nywilken commented 7 years ago

Its seems like there are a number of issues being discussed in this thread pertaining to configuration file normalization. All of which are valid, but before we start to pick things apart I think it is best to start with a little context behind configlet fmt.

With that said, I'm with @kytrinyx

I really want to get this right so that we can (a) minimize work for maintainers, and (b) support scripted changes. And make the humans happy.

Having a single source of truth for configuration file normalization is key. Achieving this can be simple or tedious, IMO.

For the record, I do agree with @tleen @bdw429s when it comes to the order, but I do understand how this can introduce more work for maintainers working with the config files manually.

Which brings me @NobbZ's statement about a CRUD like graphical tool. What if configlet had support for modifying config data via some CRUD like set of sub-commands (i.e configlet exercise [add|update|delete])? Is that something people would be interested in?

If we had a graphical tooling[1] though, which were guaranteed to work on Mac, Win and Linux at least that takes care of this and I could alter entries as I'm used to it with any CRUD-tool, I were fine with any ordering.

Regarding PRs

Targeted PRs make sense to me: "single purpose changes". nextercism is a rolling target, it's to be expected that config.json will have to adapt more than once.

👍

bdw429s commented 7 years ago

What if configlet had support for modifying config data via some CRUD like set of sub-commands (i.e configlet exercise [add|update|delete])? Is that something people would be interested in?

For what it's worth, I've all but removed the need for track contributors to even need configlet in the CFML track. We've already written CFML-based CLI tools that scaffold exercises (which includes the additions to the config.json file automatically), generate readme files, and normalize the config.json to the same specs that configlet fmt uses. Part of this was just to reduce the amount of tools needed to a bare minimum for people to contribute and because I am a man with a CFML hammer and the world is my nail... :hammer: :smile:

nywilken commented 7 years ago

@bdw429s good show - I have to take a look at what you've done as I can see value in providing such tools.

marnen commented 7 years ago

@nywilken:

It's original intent behind [configlet fmt's] creation was not to enforce a stand order, but to normalize the JSON file in a consistent manner for automated scripting purposes.

Why would that need normalization, unless our scripts are acting directly on the JSON as text? (And of course, that's not how it should be done in general: JSON should be parsed into appropriate data structures by the script, and then all manipulation should happen on these data structures, serializing them back to JSON only after that's happened.)

We needed a way to enforce topic names programmatically across tracks #60

I'll look at that issue...but surely if we need to reinforce that, then plain JSON is the wrong tool, no? (Although I hate to suggest it, perhaps we need a monorepo so everything can refer to the same files?)

What if configlet had support for modifying config data via some CRUD like set of sub-commands

That might be interesting (like git config?), but only if it has a decent UI. Otherwise people will go back to editing the config files directly, and we're right back at square one.


@bdw429s:

I've all but removed the need for track contributors to even need configlet in the CFML track. We've already written CFML-based CLI tools

My hat is off to you from a programming point of view, but from the point of view of Exercism maintenance, this worries me a bit. If we have standard tools meant to be used for all tracks, then we should probably not reimplement them, or else they'll surely go out of sync.

I am a man with a CFML hammer and the world is my nail

I'm a former ColdFusion developer myself. Friendly advice from that perspective: get yourself some other hammers as well. :)

NobbZ commented 7 years ago

@marnen:

I would very much want two separate commits (possibly in one pull request).

Nope, since this would defend the sole purpose of normalisation: easier to read diffs.

serializing them back to JSON only after that's happened.

And this is where anarchy begins if you do not normalize the JSON before hand… Since every tool, every JSON serializer might behave differently.

NobbZ commented 7 years ago

@bdw429s:

[beeing proud of reimplementing configlet in the tracks language]

Why should I put time into recreating a tool that is already there and that is cross platform. This tool is even available precompiled for MacOS X, Linux 32 and 64 bit as well as ARMv5 and v6, Windows 32 and 64 Bit…

It is well tested by the maintainers of all - 1 tracks, known to work as expected, and when specs change, its the first tool who knows, since this tool actually is the spec…

The Not-Invented-Here-Syndrom you suffer from though, is what I've often seen to ruin projects, because they got to expensive after months of developer time were spent to rebuild something that was available for only 1k USD/year, thats less than a week of a developers time costs, not even to speak about his desk, PC, powerusage etc…

But we are not here to discuss NIHS. Feel free to stick with your tool, but please do not complain when later on there will be the scripted PRs which will cause a lot of noise on your diffs or when your track can't get processed or served properly because you didn't pick up a small change in the specs of those files fast enough.

nywilken commented 7 years ago

@marnen

Why would that need normalization, unless our scripts are acting directly on the JSON as text?

There was a concern that programmatically modifying config files would result in inconsistencies when it came to idententaion, since config files are modified differently across tracks. So the fmt command was built to normalize “format” JSON files (via JSON serialization and deserialization in Go) to ensure that files followed a standard level of indentation and were in a reproducible structure. The fmt command is idempotent, and should always result in the same file structure with little to no changes specific to indentation; making scripted PRs for config files easier to consume and free of major white space changes. At the current moment, the use of configlet fmt is being ran for the first time on some tracks so there are a number of white space changes in the PRs referencing this issue. But that should not be the case for future PRs.

I hope that answers your first question.

NobbZ commented 7 years ago

@nywilken:

Which brings me @NobbZ's statement about a CRUD like graphical tool. What if configlet had support for modifying config data via some CRUD like set of sub-commands (i.e configlet exercise [add|update|delete])? Is that something people would be interested in?

Personally I'm fine with such an approach, I do often prefer CLI over GUI. But we have to ask the other maintainers as well. Especially in the more Windows and Mac oriented languages, I think there will be demand for something more GUI like than an interactive CLI tool.

marnen commented 7 years ago

@NobbZ:

Nope, since this would defend the sole purpose of normalisation: easier to read diffs.

I can't make this make sense, which probably means I'm not understanding you correctly. Can you explain what you meant?

And this is where anarchy begins if you do not normalize the JSON before hand…

Quite the contrary. The normalization (if any) happens on serialization, so any normalization done on input is completely irrelevant.

Since every tool, every JSON serializer might behave differently.

I have 2 responses:

  1. There's only one tool in question AFAIK, that being configlet. Am I misunderstanding?
  2. Even if there were multiple ones, and even if they for some reason normalized their serialized output differently, I wouldn't care much as long as the semantics were the same.
NobbZ commented 7 years ago

@marnen:

Nope, since this would defend the sole purpose of normalisation: easier to read diffs.

I can't make this make sense, which probably means I'm not understanding you correctly. Can you explain what you meant?

As long as we have a normalized base and re-normalize after each change, resulting diffs will be kept clean. There will be no jumping keys because of different or even random ordering of keys in an object on serialisation.

And this is where anarchy begins if you do not normalize the JSON before hand…

Quite the contrary. The normalization (if any) happens on serialization, so any normalization done on input is completely irrelevant.

Well, you need to normalize it once to have a clean base, make tabula rase.

Of course you can decide to not accept those first clean up and do your thing, but the first scripted change including normalisation will come now or later.

There's only one tool in question AFAIK, that being configlet. Am I misunderstanding?

The scripted changes can be done by tools written in any language by using any JSON library. Therefore configlet fmt should be run afterwards to retain normalized format.

Even if there were multiple ones, and even if they for some reason normalized their serialized output differently, I wouldn't care much as long as the semantics were the same.

But I do.

When a scripted change comes in, which is purpose is to multiply all difficulties by 10 to scale up to the new range from 0 to 100 instead of 0 to 10, and this change does also swap the keys of the object around, I am not able to properly review those changes because the diff gets to noisy.

Therefore we normalize, to be able to keep small focusable diffs.

I do not trust foreign changes, I do not trust foreign scripts, I do not even trust my own, this is why I like to get reviewed, this is why I really miss a second pair of eyes for the erlang track…

marnen commented 7 years ago

@NobbZ:

When a scripted change comes in, which is purpose is to multiply all difficulties by 10 to scale up to the new range from 0 to 100 instead of 0 to 10, and this change does also swap the keys of the object around, I am not able to properly review those changes because the diff gets to noisy.

Right, if formatting and semantic changes are mixed in one commit, then that commit's diff is less clear. This is why, if that happens, I was recommending above that it be done in two commits.

Therefore we normalize, to be able to keep small focusable diffs.

Of course diffs should be small and focused. But I believe that the way to get them that way is to use the VCS to make small focused commits. Changing the source code itself solely in order to make the diffs smaller is putting the cart before the horse and making a fetish out of the tools.

IMHO the source code should take a form that makes it readable, not one that makes the diffs so. Likewise, the diffs should take a form that makes them more readable, not the source code.

marnen commented 7 years ago

@NobbZ (Also just saw your comment about Erlang. I'm not exactly an expert, but I do have a clue about Erlang and I can take a look...)

NobbZ commented 7 years ago

Right, if formatting and semantic changes are mixed in one commit, then that commit's diff is less clear. This is why, if that happens, I was recommending above that it be done in two commits.

So you suggest to start from a normalized base. Run scripted changes which do change data and because the serializer does random order of keys each time, you get a huge diff. You do a first commit. After that you renormalize resulting in another huge diff.

This results in 2 commits having huge diffs in their own but changing only 2 lines net.

marnen commented 7 years ago

No, that's absolutely not what I'm suggesting. I'm suggesting that if normalization needs to happen as part of a script, it should be in its own commit.

This results in 2 commits having huge diffs in their own but changing only 2 lines net.

And if that were the case, then it would be easier to review the branch diff than each commit's diff individually. That's fine. The reverse is fine too. Git works for us, not the other way around.

marnen commented 7 years ago

...oh, and I'm also suggesting that the script doesn't need to start from normalized input (or indeed to create normalized output, though it will probably do so since serializers tend to behave predictably).

marnen commented 7 years ago

...thinking this over some more. I have no particular desire for every script to arbitrarily change the order of JSON properties, but I also don't want to have that order mandated for the config files themselves. Preserve input order on output, perhaps?

NobbZ commented 7 years ago

Which you cant. Let's assume javascript. It will randomized keys on each serialisation even if no objects have actually changed.

Marnen Laibow-Koser notifications@github.com schrieb am Mi., 25. Okt. 2017, 01:02:

...thinking this over some more. I have no particular desire for every script to arbitrarily change the order of JSON properties, but I also don't want to have that order mandated for the config files themselves. Preserve input order on output, perhaps?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/exercism/meta/issues/95#issuecomment-339162205, or mute the thread https://github.com/notifications/unsubscribe-auth/AADmR2G3VZHangmh40T8gfBT5Z0nWCgyks5svmyQgaJpZM4QCP4O .

marnen commented 7 years ago

@NobbZ You actually could (by reading key order when parsing, and using toJSON to customize the behavior of JSON.stringify), but I'm not sure it would be worth it. :) (I suspect Ruby would do better on this, because hash keys are ordered. No idea about any other language's JSON library, because I rarely care.)

marnen commented 7 years ago

Anyway, I've probably gone on way too long here, and for that I apologize. I wouldn't be absolutely opposed to normalization if we could guarantee a nice human-readable order of parameters as @petertseng suggested above, although I do still sort of think that any approach that cares about the serialization order of an unordered structure is kind of doing it wrong...

bdw429s commented 7 years ago

from the point of view of Exercism maintenance, this worries me a bit.

@marnen I understand and have thought through the same thing. It's been really a trivial amount of work, and to be honest, it's work for me, not for anyone else, and I'm ok with that.

get yourself some other hammers as well.

@marnen Heh, yes-- that was being a little tongue in cheek on my part. Trust me, I have many hammers in my toolbelt. CFML is personal to me however, and I have a secondary goal in this Excercism track of using it to showcase just how modern the language is. Heck, most people out there don't even realize you can do CLI scripting in CFML these days! This has been in part an experiment in what I can accomplish for the sake of learning. I'm well aware that the onus will be on me to keep any of these CLI scripts up to date, but they've all been very trivial to date thus the nice seamless tie in to the existing CLI exercise generation is worth it IMO.

Why should I put time into recreating a tool that is already there

@NobbZ It sounds like you shouldn't. My reasoning for doing so is above :)

since this tool actually is the spec

@NobbZ I would disagree. The spec is the spec. Configlet is the official implementation of that spec. As a track maintainer, I'm ok with taking on the responsibility of creating an end-to-end set of CLI scripts which not only follow the spec, but also do much more than configlet does. I'm scripting complete exercise scaffolding, unit test generation, and JSON manipulation. The fact that I tossed in the readme generation and JSON normalization to the package is just a side note.

The Not-Invented-Here-Syndrom you suffer from though, is what I've often seen to ruin projects, because they got to expensive after months of developer time were spent to rebuild something that was available

@NobbZ I appreciate your concern, and in most business scenarios I would likely agree. Let's be realistic though, I spent about 5-10 minutes hammering out a nice CFML-based JSON formatter that's all of 13 lines of code. I'm not running a for-profit here, this is a hobby and one that I specifically want to showcase modern CFML that fits nicely into the existing toolchain a CFML developer will have installed.

but please do not complain

@NobbZ I assure you, I won't be complaining I'm well aware of what I'm doing. :smile:

later on there will be the scripted PRs which will cause a lot of noise on your diffs or when your track can't get processed or served properly because you didn't pick up a small change in the specs of those files fast enough.

@NobbZ Actually, no, that won't happen. If the spec (and therefore configlet fmt) is changed, there will be another round of pulls to update the format of all the configs (much like this one), at which point I'll simply update my tooling to match. If it ever becomes a burden, I can easily drop my version and simply switch to running configlet.


Back to the discussion, I think most people are more in agreement than not... well maybe, lol. First and foremost, for this to work at all, I think we all need to agree to whatever formatting/normalization tool we're using, it needs to be run every time any change is made to the JSON, and in my opinion, it should be part of the same commit. Assuming that the JSON file is proper to begin with, that means that every diff will only show the actual changes that were made to the JSON, which is what we're going for.

I'm also assuming that any automatic formatting/normalization should never re-order the JSON arbitrarily. That means that if a track owner wants his/her keys in a specific order, that will never be changed.

The only remaining loose end is if a track owner wants to automate a change to their JSON using a language/parser that doesn't preserve order of an object's keys AND that same track owner doesn't want their keys changed. That will be on them to deal with. So long as any sweeping "changes from above" are made via a lang like Go that happens to preserve the order, there shouldn't be any issues as long as the JSON is normalized before and after the change.

So, in short, I'm proposing that the spec for the JSON doesn't include specifics on the order of keys, but only the whitespace like indentation, spacing, and line break characters. Let the track owners decide if they want to order keys, etc and the formatter won't touch them.

Stargator commented 7 years ago

I lot of things I'm for I have been mentioned. Though to be honest, I haven't given the config and maintainer files much thought.

I definitely agree that it needs to be as human readable as possible. So that from a glance you can get the information you want.

NobbZ commented 7 years ago

like Go that happens to preserve the order

It doesn't. If the serializer is based on a struct, it will emit the keys in the same order as they are specified in the go sourcecode of the struct, if though the serializer is based on map[string]interface{}, keys will be randomized. If you have a []map[string]interface{} (array of objects) order of keys will even differ between each object.

petertseng commented 7 years ago

What if configlet had support for modifying config data via some CRUD like set of sub-commands (i.e configlet exercise [add|update|delete])? Is that something people would be interested in?

I don't think I'd be interested; I think I'd just do it via a text editor. I think the reason is just that I don't want to have to learn these new commands, whereas I already know how to use my text editor. Seems like too much work (both for those who would implement those commands and those who would learn them) for too little value.

I also don't want to be in a world where such commands are necessary (if, for example, config was a binary format that wasn't human-readable at all), but I don't think anyone was suggesting a move to that world

petertseng commented 7 years ago

Something I wonder about.

The body of this issue does a good job of explaining "Why?"

By normalizing the config files, we will reduce the amount of noise in those PRs.

The title of this issue answers the question "What?"

Run configlet fmt on all tracks

I wonder whether the title is directing the discussion too much toward the merits of one specific "What?" and not enough toward "How do we achieve the goal we are trying to drive toward?". Generally I think we are staying on the right track; I hope it stays that way.

Maybe the way to achieve the goal isn't necessarily via using fmt on all config files, I don't know. I don't have any other suggestion right now, but I wouldn't want to prevent anyone from thinking about it.

I'm proposing that the spec for the JSON doesn't include specifics on the order of keys, but only the whitespace like indentation, spacing, and line break characters. Let the track owners decide if they want to order keys, etc and the formatter won't touch them.

This proposal is one of two that would currently be acceptable to me (the other possibility being to do the same, but also specify key ordering).

petertseng commented 7 years ago

Let's hear it if there are any predictions (other than to add track_id) for what changes will be necessary in the future.

Let me list those that have happened in the past:

Maybe the way to achieve the goal isn't necessarily via using fmt on all config files

I honestly was so tempted to suggest using text transformations (add this key one line above this other key, with the same level of indentation), but there is too much madness here:

Well, it was worth a try.

How about I ask this question instead. I hope this isn't considered a pointless exercise; it is to help understand the problems we're trying to solve.

Explain why it is preferable for one person to personally script these changes and submit them as PRs to all tracks. Compare it to alternatives such as filing an issue against each track and letting the track maintainers do it.

Well, about those PRs that added uuid, core, unlocked_by to each exercise, it would have been a huge duplication of effort to make each track maintainer script that change individually. It makes sense to simply write the code once and do the track maintainers the favour, otherwise it might never get done! Wouldn't want to have tracks that procrastinate too much and hold back Nextercism.

I guess I understand. Now to see how to make that work.

bdw429s commented 7 years ago

It doesn't.

@NobbZ Ah, crap. Sorry then I misunderstood. It sounds like it would be necessary to rewrite configlet to not parse the JSON, but simply format it's whitespace as a pure string manipulation. This is exactly what the CFML lib does that I'm using. It never deserializes the JSON into native data types, but simply reads the string, iterates over each character, and applies the whitespace rules as defined (indentation, spacing, line breaks, etc). In that way, it will never modify the order of the data, it will only format the white space around it.

Insti commented 7 years ago

My first preference would be to not mess with the track's formatting at all.

Change to an existing key: Change in-place. Delete an existing key: delete in-place, move everything after up/left. Add a new key: Put it at the end.

NobbZ commented 7 years ago

@Insti Can you name a tool that does this, preserving all order and indentation and stuff?

cmccandless commented 7 years ago

@NobbZ @Insti Well, for what it's worth, I just wrote one in less than an hour. Python's json library maintains key order. This script will even auto-detect existing key order for adding new exercises.

Insti commented 7 years ago

Can you name a tool that does this, preserving all order and indentation and stuff?

No, but: a) It wouldn't surprise me if several already existed. b) It would be easy to write one. (see above)

We are paying the price for using JSON for configuration files. It seems our options are:

tleen commented 7 years ago

It may be time to start looking towards consensus in this issue. Here is what I am reading:

  1. JSON ain't great No it is not, and its limitations are well known: can't comment in it, and JavaScript serializers could care less about key order. However, JSON has more-or-less become the default format for configurations across the board (not just in Exercism), so there is no real push to change config.json to config.?. So how to deal with the JSON?

  2. config.json is hand-edited For the most part maintainers manually edit the config.json file. This file is becoming quite big in some tracks. Track maintainers have ways they like to edit the config.json file and scripted updates can be disruptive to that process. Due to the hand editing, anything that makes it easier for human readability is a Good Thing.

  3. Property order matters Due to the point above, the ordering of the properties for the exercises in config.json matters to most. For those it does not matter, there is no strong preference beyond it being logically consistent and common across tracks. I don't see anyone advocating for alphabetical ordering or anything else, but when there is a preference it is universally for something that is easy for humans to read and understand in the exercise context, i.e. exercise slug on top.

First attempt at a conclusion: seems like we are overall for defining some ordering/grouping on the properties of the exercises in the config file and have any tooling default to using that ordering. Anybody strongly against this?

petertseng commented 7 years ago

Just curious:

JSON has more-or-less become the default format for configurations across the board (not just in Exercism)

Got any corroborating sources for this information? I feel curious about how projects do configuration. I tried to ask my preferred search engine about "most popular configuration file format" but all I got was http://www.terminally-incoherent.com/blog/2013/08/28/what-is-your-favorite-congfig-file-format (a poll which may not be representative) and posts advising not to use JSON (these posts may have been necessary because JSON is the most popular, but that's not explicitly stated).


These are my preferences (and thus, I gave 👍 to the above proposal):

  1. Most preferred: Normaliser changes whitespace only and leaves ordering intact. To script the addition of new keys, place them immediately after or immediately before a key that is related to it.
  2. Normaliser changes whitespace and ordering, using an ordering that we discuss and agree on.
  3. No normalisation and just open issues against all tracks if we desire a config.json change, and let track maintainers do it on their own time. (How to deal with tracks that miss a hard deadline to complete the change?).
  4. Further discussion to find more alternatives. (I'd consider a different file format if y'all cared, but let's see the alternatives and benefits)
  5. Normaliser changes whitespace and alphabetises ordering.
  6. No normalisation and some poor victim has to attempt to do text processing on the JSON in order to add new keys (I honestly suspect that this will work for the majority of JSON files for simple changes, but it's just too fragile for me to recommend, and hard to make it work on more involved changes)
  7. Least preferred: No normalisation and some poor victim has to manually make PRs to each track by hand if we desire a config.json change that should go out to all tracks.