Open bgamari opened 5 years ago
I am actually starting to think that maybe we do want to fix this. This mistake is made for numerous abbreviations including:
Even if we fixed these, I'm generally concerned that the change in capitalization will result in broken links. We should at very least produce a name mapping that can be used to generate an nginx redirection table.
I can see where this goes wrong. We use the casing
library to convert these names, and specifically, we use fromHumps
to parse the original name into "words". That parser expects camel case or pascal case, and specifically, the .NET-flavor, where abbreviations longer than 2 characters are written like words (e.g. XmlParser
, not XMLParser
).
I'll see if I can come up with a better approach to parsing these.
It seems that applying the possible parsers in the right order does the trick: instead of fromHumps
, we'll use fromKebab >=> fromSnake >=> fromHumps
. In this particular case, fromKebab
will split things into "GHC", "7.10.3"
, and then fromHumps
will operate on "GHC"
, which, due to no non-uppercase letter following any uppercase letter, will not split it any further.
I'll push the fix once I'm done testing.
The capitalization change, btw., is inevitable, as Gitlab will automatically lowercase everything; we inject dashes to keep it readable and comply with Gitlab's naming conventions.
If we want external links to remain functional, we will need a translation table though, or a very clever way of replicating the name mangling on the fly.
9e5c2ba6fe023373dbb0f29fb5f356206e75fc17 fixes the name mangling so that GHC-7.10.3
becomes ghc-7.10.3
rather than gh-c-7.10.3
.
We still need to emit a mapping though.
Oh, and we need nginx rewrites anyway, because the old links will point to /trac/wikis/ghc/...
, whereas the new ones need to go to /ghc/ghc/wikis
.
a467e24dd989efcfaaf7b51928821b8c73c33583 adds an nginx rewrite rule generator. Generated rules are appended to rewrite.nginx
in the CWD; for a clean set of rules, one should clear out this file prior to running an import, post-process it with something like sort -u
to remove duplicates, and then paste it into the nginx configuration in the appropriate location. (Actual usage of said rules untested as of yet).
The capitalization change, btw., is inevitable, as Gitlab will automatically lowercase everything; we inject dashes to keep it readable and comply with Gitlab's naming conventions.
Is this really true? Looking at https://gitlab.staging.haskell.org/ghc/ghc/wikis/Trac-Ticket-Import I'm a bit doubtful. In fact, I think things would be significantly more readable if we did preserve capitalization. The only thing we need to change is whitespace.
A significant fraction of the pages in the
Status/
wiki namespace are mis-named. Specifically, the stringGHC
is transliterated asgh-c
. For instance,Status/GHC-7.10.3
is translated tostatus/gh-c-7.10.3
.I suspect this isn't worth fixing as it's quite easy to fix-up post-facto but I thought I should at least record the infelicity.