atlas-engineer / nyxt

Nyxt - the hacker's browser.
https://nyxt-browser.com/
9.64k stars 404 forks source link

Fix SET-URL #3385

Closed shamazmazum closed 2 months ago

shamazmazum commented 2 months ago

As discussed in #3383, I open another PR which fixes navigation to URLs like "github.com/atlas-engineer".

It brings two changes: 1) Uses valid-url-p to check if "https://" + query is a valid URL. 2) In valid-url-p move the check of TLD validity out of (or (not check-dns-p) ...) block. Maybe it is my fault (I'm too lazy to do git blame), but this has nothing to do with DNS lookup.

Now you can set URL to github.com, github.com/foobar, wiki foobar or foobar and get what you want in all those cases.

shamazmazum commented 2 months ago

Oops, one test fails, investigating

shamazmazum commented 2 months ago

Done! As I said, I would like to remove DNS lookups completely in later commits, so URLs like http://foobar would be valid URLs. DNS is not used to check if an URL is valid since 377e9ab, so as for now I think this is a good idea to check if a query is an URL without a scheme or a search: if a query contains a dot and (valid-url-p (str:concat "https://" (query query)) :check-dns-p nil) is non-NIL, then it is not a search.

@aadcg Please review

aadcg commented 2 months ago

@shamazmazum I'll review soon, thanks.

aadcg commented 2 months ago

@shamazmazum I think there are two issues in this PR.

  1. As we have agreed, we don't want to change type signatures in this PR. See the change in initialize-instance :after.
  2. The issue we're solving is related to the call (and check-dns-p (valid-tld-p (query query))). Note that (cl-tld:get-tld "github.com") => "com", while (cl-tld:get-tld "github.com/") => NIL. For this reason, the later case is interpreted as a search engine query. The solution you propose seems too fragile.
shamazmazum commented 2 months ago
  1. initialize-instance still can accept check-dns-p, obviously, but OK, I bring it back.
  2. The solution I propose it simple: if it looks like an URL, check if it is a valid URL with quri. cl-tld just does not work here. Moreover, quri was also used before 377e9ab (the commit which introduced this bug). If you think that this is not OK, I would like to see any examples where this approach does not work.

UPD: I may be wrong, but firefox does the same: if a query contains a dot, it tries to interpret it as an URL. If your URL does not have a dot, you have to specify the scheme. Surely, firefox never did any DNS lookups to determine if a query is an URL or not.

Before 377e9ab the conditional was:

((and check-dns-p
      (valid-url-p (str:concat "https://" (query query))
                   :check-dns-p check-dns-p))
 (setf (query query) (str:concat "https://" (query query))))

I just replace check-dns-p with a test for a dot and always pass nil to :check-dns-p (as you already do).

UPD2: Check this code from initialize-instance on master:

(cond
    ((valid-url-p (query query)
                  :check-dns-p nil)
     ;; Valid URLs should be passed forward.
     nil)
    ((and check-dns-p
          (valid-tld-p (query query)))
     (setf (query query) (str:concat "https://" (query query))))

What is the logic of using valid-url-p in the first conditional and a weaker check valid-tld-p in the second?

aadcg commented 2 months ago

@shamazmazum enter about.this, github.com and 192.168.0.1 on Firefox and note that the first one is interpreted as a search query. This seems to be the counter-example you've requested.

shamazmazum commented 2 months ago

@aadcg I agree with the first example. I still suggest using valid-url-p because it parses an URL and then you can pass the host part to valid-tld-p. We can do valid-url-p which checks TLD and a weaker predicate which omits this test (e.g. when the scheme was explicitly supplied). Obviously, you will have to change the type of valid-url-p to something like (-> valid-url-p (string &key (:check-tld-p boolean)) (values boolean &optional)).

Otherwise, I don't know. Cut everything after / in the query? But github.com/foo bar baz is obviously not an URL. Reinvent URL parsing in valid-tld-p? What for if we have quri.

I am out of ideas :) I'll just let it be if you don't like the idea with TLD check in valid-url-p.

shamazmazum commented 2 months ago

BTW, there is a TLD check in valid-url-p which never runs.

aadcg commented 2 months ago

@aadcg I agree with the first example. I still suggest using valid-url-p because it parses an URL and then you can pass the host part to valid-tld-p. We can do valid-url-p which checks TLD and a weaker predicate which omits this test (e.g. when the scheme was explicitly supplied). Obviously, you will have to change the type of valid-url-p to something like (-> valid-url-p (string &key (:check-tld-p boolean)) (values boolean &optional)).

Overall, I agree. If we follow this approach, then we don't need to check for dots in the query or am I missing something?


To put things into perspective, I've noticed the mess while fixing #2134. I took the approach of changing the minimum. I did mistakes nonetheless (8b00a2f8c and the bug you mention when querying "github.com/user/repo"), but the goal was to keep all of the knobs in place to easily revert to a previous state. It is now clear that it makes little sense to try to fix what is fundamentally broken. Therefore, I take my word back and suggest that we iterate on what the ideal solution would look like without any constraints. Sorry, I had the hope that this wouldn't spiral so deep.


Some things to take into account:

  1. The only "DNS check" we care for is the one done by cl-tld. I'm not sure whether, linguistically and technically, we should call it a DNS check or TLD check.
  2. Note that the DNS/TLD-check (in the above sense) is a needed parameter, see prompter:filter-preprocessor and prompter:filter-postprocessor from new-url-or-search-source.
  3. Ensure that issues #2134 and #3349 remain fixed.
  4. Bonus points for fixing this issue.

Let me see the full solution you suggest and then we'll go from there. Thanks!

shamazmazum commented 2 months ago

@aadcg I hope, I did it :) I split it into two commits: the first fixes the issue with minimal changes and the second does some clean-up which removes :check-dns-p parameter and also lookup-hostname function, because it is not used anymore. If you now allow me to change signatures (as I think, you do), I suggest accepting all those commits.

2134 remains fixed (at least I see not delays on my computer), #3349 fixed, IP addresses work (127.0.0.1 goes directly to https://127.0.0.1). Additional test cases added.

Please, re-check if I missed something (I hope not).

shamazmazum commented 2 months ago

OK. Thanks. I'll do it all later, but firstly I would like to know why use with-slots instead of with-accessors. These two mean different things. If you are sure that you want the first, then OK, sure :) But do you really want it?

shamazmazum commented 2 months ago

@aadcg I did what you said. You still have to type http://localhost if you want to visit localhost, I hope it's OK more or less. BTW check something like Images.jl (a Julia package) in Nyxt and Firefox :)

aadcg commented 2 months ago

@shamazmazum I've pushed your work in commits 14dac1e88, 424eff657, 1f583161c and ec082a380. Thank you for your patience!