Closed anarcat closed 7 years ago
Yeah, these improvements make sense to me.
Originally, I was a bit torn, because when users (or tools, in this case) specify some qualifier (e.g. the suite), we should respect that. But then I thought about it from the other side: users following a now outdated (or malformed to begin with) link will most likely be happier seeing a working page than seeing an error message.
If it turns out that people have legitimate use-cases for exact matching (i.e. without a fallback to the next best thing), we can still add an additional endpoint for that.
Let me know if you need any pointers with regards to implementing this change, and thanks in advance!
pointers would sure be useful. i looked into fixing language redirection very quickly, and it looks like it should already falling back to the default locale ("en"?):
https://github.com/Debian/debiman/blob/master/internal/redirect/redirect.go#L41
there's a comment there that says:
// ensure that en comes first, so that language.Matcher treats it as default
or maybe there should be a change in the Narrow
function? there a bunch of comparisons there, which seem to be doing what we need...
Note that bestLanguageMatch
is only called if t.Language == ""
:)
Yes, your changes definitely will need to touch the Narrow
function.
would that make sense as a (failing) unit test?
diff --git a/internal/redirect/redirect_test.go b/internal/redirect/redirect_test.go
index 88137e4..ce4d175 100644
--- a/internal/redirect/redirect_test.go
+++ b/internal/redirect/redirect_test.go
@@ -350,6 +350,9 @@ func TestUnderspecified(t *testing.T) {
{Case: 16, URL: "jessie/i3-wm/i3.1.fr", want: "jessie/i3-wm/i3.1.fr.html"}, // default suite
{Case: 16, URL: "testing/i3-wm/i3.1.fr", want: "testing/i3-wm/i3.1.fr.html"}, // non-default suite
{Case: 16, URL: "jessie/libedit-dev/editline.3.en", want: "jessie/libedit-dev/editline.3edit.en.html"}, // section with subsection
+
+ {Case: 17, URL: "jessie/i3.1.dne", want: "jessie/i3-wm/i3.1.en.html"}, // non-existent locale
+ {Case: 17, URL: "dne/i3.1.en", want: "jessie/i3-wm/i3.1.en.html"}, // non-existent suite
}
for _, entry := range table {
entry := entry // capture
okay well, I tried to figure out the "narrow" function, and frankly, I can't. It seems to me it would require a massive refactoring to do what we need here.
i only managed to generate a failling test suite so far:
diff --git a/internal/redirect/redirect_test.go b/internal/redirect/redirect_test.go
index 88137e4..1486db5 100644
--- a/internal/redirect/redirect_test.go
+++ b/internal/redirect/redirect_test.go
@@ -10,6 +10,7 @@ var testIdx = Index{
Langs: map[string]bool{
"en": true,
"fr": true,
+ "es": true,
},
Sections: map[string]bool{
@@ -350,6 +351,9 @@ func TestUnderspecified(t *testing.T) {
{Case: 16, URL: "jessie/i3-wm/i3.1.fr", want: "jessie/i3-wm/i3.1.fr.html"}, // default suite
{Case: 16, URL: "testing/i3-wm/i3.1.fr", want: "testing/i3-wm/i3.1.fr.html"}, // non-default suite
{Case: 16, URL: "jessie/libedit-dev/editline.3.en", want: "jessie/libedit-dev/editline.3edit.en.html"}, // section with subsection
+
+ {Case: 17, URL: "jessie/i3.1.es", want: "jessie/i3-wm/i3.1.en.html"}, // non-existent locale
+ {Case: 17, URL: "potato/i3.1.en", want: "jessie/i3-wm/i3.1.en.html"}, // non-existent suite
}
for _, entry := range table {
entry := entry // capture
one of the problems I'm having with the narrow function is that there's a huge disconnect between the "filter" and "qualified" functions: the "qualified" function may find the perfect fit in the filtered list, but then when the filtered list is returned, only the first element is taken. this means we rely a lot on sorting the elements, as opposed to just picking the right one from the start, which makes it really hard to introduce multiple ambiguity criterion.
Right now, it seems to be that only the "section" has such sort-based fallback, and I couldn't figure out how to introduce other fallbacks without breaking that one.
As discussed via IRC: the tests you posted look fine, but one detail needed to be changed: potato/i3.1.en
was parsed as binarypkg=potato.
i have pushed changes to dman to make use of the new changes, which optimize the "no such manpage" use case by reducing the 4 HTTP calls to a single one. :)
thanks!
the latest dman code uses this loop to fetch raw manpages:
ie. it will do 4 hits on the webserver for a missing page. This means that "dman dman" takes 5 seconds to execute here:
switching the base_url to dyn.manpages.org brings this down to about 4 seconds.
but it seems to me this logic could be more efficiently implemented in the dyn.m.o redirector. the logic would be:
would those be welcome improvements?