exercism / v2-configlet

Tool to assist in managing Exercism language tracks.
MIT License
16 stars 23 forks source link

configlet lint: Should check whether UUIDs are valid. #99

Open petertseng opened 6 years ago

petertseng commented 6 years ago

Decision: configlet lint should check whether UUIDs are valid.

Support:

Opposition: None.

Task: Make configlet lint check whether UUIDs are valid.


Original issue text/question:

What does valid mean? Is it the standard xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx where each x is a hexadecimal digit?

Whatever "valid" means in the Exercism context, would it be good for configlet to check for invalid UUIDs? If whatever is using those UUIDs is expecting them all to be valid, it might be needed.

[gems/trackler-2.2.1.45]$ for c in tracks/*/config.json; do echo $c; ruby -rjson -e 'h = /[0-9a-fA-F]/; j = JSON.parse(ARGF.read); j["exercises"].each { |e| u = e["uuid"]; puts "%24s has %s" % [e["slug"], u] if u !~ /^#{h}{8}-#{h}{4}-#{h}{4}-#{h}{4}-#{h}{12}$/ }' $c; done 
tracks/bash/config.json
tracks/c/config.json
tracks/ceylon/config.json
tracks/cfml/config.json
tracks/clojure/config.json
tracks/coffeescript/config.json
tracks/common-lisp/config.json
tracks/coq/config.json
tracks/cpp/config.json
           atbash-cipher has ba4e0186-0389-af80-d3e4-695dfc4e63bff1fa503
            bracket-push has ce9fd8ac-0beb-1a80-4ff6-be19ce1540d905c31b7
                 pangram has ea8d6224-0440-6d80-1569-a04127908a200d2ecf0
tracks/crystal/config.json
tracks/csharp/config.json
tracks/dart/config.json
               raindrops has d6657d4a-0257-0c80-6667-fddd127a571f0ad1fd4
tracks/d/config.json
tracks/delphi/config.json
             hello-world has 765b5bcf-00e0-d180-ea7f-92dae26e6da0b30151c
                 two-fer has b38f504c-098e-fe80-896f-60e6c45751033d26161
                     bob has cd80ef91-0451-c380-3d58-01a2ddff3029f430eea
                    leap has 9234d900-050c-5980-561b-e1f772ae2b9c7b3fd08
       rna-transcription has e8849e37-0b3e-ec80-c41c-c53f2a871f6ee59e2d6
               raindrops has 3b1391da-04c2-e580-30a5-6063e888da196dfba6a
                 hamming has 68c5b50b-03c0-4880-af0f-9cfacb1ba507e837d12
            bank-account has bb32f666-0f50-e480-9738-e4e3e8f7ece13c9b460
        nucleotide-count has d5cae090-0626-c880-3b62-58ed5d301931de3171d
                     etl has 4e538a24-0e30-2b80-2b3f-82ca73b593a81a8cc8d
                  grains has acf5b5ab-03d7-0e80-f419-255df25d35a2466a421
      collatz-conjecture has d8093412-0665-0280-b4ae-edd0a7b483a316f236b
               beer-song has 311b59d8-01ed-0780-14a9-fa94a2937508a4fbf40
            phone-number has ce2587db-02b3-3880-e5aa-0765496de558966abf5
         perfect-numbers has ea99e05d-0f87-4280-aec3-907d40556314f89a033
           binary-search has 5a865bce-0f07-e980-5c71-c5307dc82e30076faeb
                   clock has 4e38035b-040b-0880-b592-f750e82b574e0e58c41
                triangle has ad99318a-0397-e080-5d92-0043fa439ccc604f7ed
               allergies has aedbd675-0c9f-3680-126d-38e9c675c8a00297062
           saddle-points has dbfbea24-014a-0080-5b0e-fe38d498443be3df149
         circular-buffer has a1cb2ebe-05f0-4380-c2c4-7efe9eba69d05f6de75
             minesweeper has b36d2d0d-0ef6-0780-cb76-de828d83c7ce7f5b62b
          roman-numerals has 90d241e1-07d4-ff80-de15-77f2042d2d249233b28
                 bowling has 0bfab0d4-04ca-2080-f5ae-3199d09960b80633e82
               pig-latin has 40c44251-0dff-3380-6a14-ef8e02d1e9242dca58f
              book-store has 234df60b-0b1a-1e80-67c5-f8bcefb32e61245df7d
                   wordy has 426eaa07-02bf-cc80-d4b6-d444f233c0b0eab3b8e
                   poker has 14675cf2-0504-1480-8495-d9aa32915b3a76f2383
tracks/ecmascript/config.json
             twelve-days has 0e43944b-0a68-5680-ef11-70999d2df897c894476
tracks/elisp/config.json
tracks/elixir/config.json
tracks/elm/config.json
      collatz-conjecture has 1f52ab74-02d5-db80-8f95-521aba04d183e704422
           all-your-base has 7890045d-0cf5-1980-21ae-7bd0228ad2a2993ae82
        pascals-triangle has a730ca85-057d-db80-e44e-25b8028acede6337cca
                 isogram has 5f540090-061e-2f80-40a8-d9782700ed2efdf8965
tracks/erlang/config.json
                 two-fer has 1a8abaa2-04f5-0e80-7986-5fb33995971caeb1496
tracks/factor/config.json
tracks/fortran/config.json
tracks/fsharp/config.json
tracks/gnu-apl/config.json
                    leap has 9a89822e-0d61-f980-49fa-c273c0a0e0869b9fc7f
tracks/go/config.json
                 two-fer has dc4ad67d-0fe4-0880-5a13-3aebb8759bcc3095de1
tracks/groovy/config.json
                 two-fer has 3227f4a2-0ffc-1480-5018-4ef3479dbcffdcf3f2b
          scrabble-score has 1d098fb3-0a27-5380-3849-73f2ca77ff2f84ca698
tracks/haskell/config.json
tracks/haxe/config.json
                    leap has d30952cf-0b7d-a980-9351-caedebfed16362c6a7a
tracks/idris/config.json
              accumulate has 612f5f98-09a6-9d80-5820-edeec5ce1e1544073ed
tracks/java/config.json
              tournament has 99191d53-0fed-f680-de7a-55cfb872de17bc58629
tracks/javascript/config.json
tracks/julia/config.json
      collatz-conjecture has 43f1dc63-0412-9a80-49ba-b620f4bbfccac2731e4
         robot-simulator has deb5a654-056c-2180-2dd8-efaa14dc5225f5af86c
                  grains has 2ed6e171-04e1-b080-f383-a801b2564b3e0f4f1ee
tracks/kotlin/config.json
tracks/lfe/config.json
tracks/lua/config.json
tracks/mips/config.json
tracks/nim/config.json
tracks/objective-c/config.json
tracks/ocaml/config.json
tracks/perl5/config.json
tracks/perl6/config.json
tracks/php/config.json
tracks/plsql/config.json
             hello-world has 53156fc9-0945-f080-08a9-ce269bfc9121a5d2c5c
tracks/pony/config.json
tracks/powershell/config.json
tracks/prolog/config.json
tracks/purescript/config.json
tracks/python/config.json
                 two-fer has bacb2184-025f-0980-38d6-34fb419e03a6d3d8b44
              two-bucket has 8c89f739-05fb-7b80-b5f9-6ad079c750ba8302be8
tracks/racket/config.json
tracks/r/config.json
tracks/ruby/config.json
tracks/rust/config.json
tracks/scala/config.json
tracks/scheme/config.json
tracks/sml/config.json
             hello-world has 4450455d-0786-6f80-7672-48368a115df16e0aa7e
                     bob has 9d8e76f1-0e85-a280-b779-16335f29d1a96ce3a72
   difference-of-squares has b6713a74-08d5-4480-6524-f9dbf5cf2563d2b2a91
           atbash-cipher has 2bfc685e-0d13-0280-de5e-53bc093e3a8c5ac7c73
         perfect-numbers has b83ce171-06e6-dd80-080e-9dd189bf6df83833e92
                 pangram has 42adb6ed-09f0-6980-2d4f-75ac90cc2bf1f59fb3a
           prime-factors has 6c191163-0a98-2380-debc-75144c1787f3853e868
        sum-of-multiples has b9377be8-06c1-5280-33b6-a2071298947c9ddf65e
tracks/swift/config.json
tracks/typescript/config.json
      binary-search-tree has b2f90523-0cb1-f080-a03c-5aee5f919abf93b491b
                triangle has cf2d545c-036e-0980-3ac2-64ad54c3867e37f4cfa
tracks/vbnet/config.json
tracks/vimscript/config.json
ferhatelmas commented 6 years ago

@petertseng I am happy it's brought up because I had seen it while doing https://github.com/exercism/go/pull/894 but waited for input from others since I was away for sometime and I might be missing something.

Also, created this issue in UUID dependency: https://github.com/mattetti/uuid/issues/2

nywilken commented 6 years ago

@petertseng @ferhatelmas the concept of a UUID is fairly new, at least the support of it within Configlet. In fact, the lint command only checks for a non-empty UUID and uniqueness across exercises and tracks. Seeing as configlet is the first line of defense for any track changes, I think validating UUIDs should be part of Configlet. But I will defer to anyone on the team who may have more insight on UUID usage and validation across Nextercism.

NobbZ commented 6 years ago

I think formal validity of an UUID should be checked before even asking upstream for duplicates… Of course the API server should check again ;) We know this already from web forms ;)

Since I already copied from wikipedia, a the canonical representation is written with 32 hexadecimal digits, grouped with a rythm of 8-4-4-4-12, therefore a quick and dirty check in configlet could be to match against a regex ~r/[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}/.

This should catch at least obviously invalid UUIDs. If though the package that is used to generate, is also able to parse them, I'd prefer full offline validation, depending on which version we use.

nywilken commented 6 years ago

I think formal validity of an UUID should be checked before even asking upstream for duplicates…

Agreed.

ferhatelmas commented 6 years ago

If though the package that is used to generate, is also able to parse them ...

https://godoc.org/github.com/satori/go.uuid might be a good option.

NobbZ commented 6 years ago

At least the documentation looks good.

I've looked at the currently used UUID generator and its in fact only capable of generating UUIDv4 (randomly generated). And has no option to read or verify UUIDs.

As a matter of fact, I'd suggest not to use plain UUIDv4. I'd generate a single UUID using v1 or v4 by @kytrinyx as a base. We could use this to namespace UUIDv3/v5 per track and then use the tracks UUID as namespace for the exercises.

This way, we will still have low chances of collision, while the UUID per track/exercise combination is deterministic.

I'm not sure though, how we would handle clashes then. With v4 we could simply regenerate, With v3/v5 we could switch version once, and after that do a v4, but to be honest, if we clash 2 times in a row, then we have another kind of trouble :)

tleen commented 6 years ago

Are we seeing UUID clashing as an issue here? Lint will check for a clash although if using the uuid command to generate the chances of having a clash are remote. Most likely it would come in from a copy/paste type action. At which point a new one can be generated. Because @NobbZ is right, if we get two in a row we have bigger problems :smile:

I'm not sure what we use the UUID for in nexterism so understanding that may help. Is it to represent the exercise in the DB or just give it some sort of unique identifier in general?

tleen commented 6 years ago

The uuid generating library is just plain broken it seems (and perhaps abandoned). If we are going by the strict definition of UUID the final group is too long and the library's own tests do not catch it. We should move to a more robust option like the one @ferhatelmas suggested or google/uuid?

kytrinyx commented 6 years ago

The UUID is used to represent the exercise, and it needs to be unique across all exercises on Exercism.

I'd suggest that we use google/uuid.

We have until v2 goes live to get the uuids right in some final/usable form. I would suggest that we switch out the dependent library, and submit PRs to fix all the tracks to fix the UUIDs.

Stargator commented 6 years ago

Is there a summary of what was ultimately the decision?

kytrinyx commented 6 years ago

We decided to:

This doesn't impact the v1 exercism site, but will require a full database refresh of v2 (which was already planned, so we're not losing any data that we thought we would keep around).

kytrinyx commented 6 years ago

Going back to the original title of this issue:

Should we detect invalid UUIDs on CI? (I think it would be worth doing).

tleen commented 6 years ago

@kytrinyx we should during the lint? We can message as to the nature of the failure and how to correct it.

nywilken commented 6 years ago

I agree that checking on CI is crucial, even at lint. But my reservation, and I admit that I have not thought much into the legitimacy of this problem, is that if we run this at lint we will need to change all test data to use valid UUIDs unless there is a disable flag.

Stargator commented 6 years ago

@nywilken Are you saying some tests for exercises may trigger a failure if this is run in lint?

nywilken commented 6 years ago

@Stargator I was specifically speaking about the unit tests for configlet. We currently use UUIDs like “aaa” in configlet fixtures. UUID validation in configlet should not affect tests within exercises.

petertseng commented 6 years ago

Rephrased from question to task.