KnowledgeCanvas / knowledge

Knowledge is a tool for saving, searching, accessing, exploring and chatting with all of your favorite websites, documents and files.
Apache License 2.0
1.32k stars 92 forks source link

[Bug] URLs with third level domains are considered invalid #87

Closed NxDs closed 1 year ago

NxDs commented 1 year ago

Attempting to add links with third level domains such as https://test.example.com will throw an invalid URL error. This however does not happen with URLs using www such as https://www.example.com

image

Version: 0.7.0

ByteSyze commented 1 year ago

Knowledge is actually handling N-level domains without issue as long as the domain is reachable. This error message could be changed to something more precise, e.g. "Website unreachable". Even better, perhaps display the HTTP response code in a debug level notification.

NxDs commented 1 year ago

It is not tho, this is the original URL I tried https://play.google.com/ which is very clearly real and reachable, it does however fail regardless

image

NxDs commented 1 year ago

After some debugging it appears that yes, it does seem to handle N-level domains, the issue is actually something else, specifically a 403 response, perhaps when making the request it's not setting a proper agent? I can't see why else specifically google play website wouldn't be working, I did try a couple of other websites with 3 level domain and they do in fact work Knowledge_SAIe6HDLar

NxDs commented 1 year ago

Replicating the same request using the same exact headers copied from network tab in DevTools seems to be working and 200 is returned with the page content in Java

HttpURLConnection uc = (HttpURLConnection) new URL("https://play.google.com/").openConnection();
uc.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Knowledge/0.7.0 Chrome/106.0.5249.165 Electron/21.2.1 Safari/537.36");
uc.addRequestProperty("cookie", "NID=511=IV2Uq0rfCoFIKKGF5DeICTV76wCw9daPy6U1p8UEzXFvZLPTR9Hu1P2IvnZ8MlZo5qB7T4UwkcXIY5dYNYK3C1CNPjXmAxBPQh5eBB1NiHwu2iwxMsOlx3Zw8BqI5Cqa0lg9SvNClryP8i6BZ4su70mof_r1qusby8wuJTaoqnY");
uc.addRequestProperty("accept-encoding", "gzip, deflate, br");
uc.addRequestProperty("accept-language", "en-US");
uc.addRequestProperty("sec-ch-ua", "\"Not;A=Brand\";v=\"99\", \"Chromium\";v=\"106\"");
uc.addRequestProperty("sec-ch-ua-mobile", "?0");
uc.addRequestProperty("sec-ch-ua-platform", "Windows");
uc.addRequestProperty("sec-fetch-dest", "empty");
uc.addRequestProperty("sec-fetch-mode", "cors");
uc.addRequestProperty("sec-fetch-site", "cross-site");
uc.connect();
System.out.println(uc.getResponseCode());

This ends my testing, I'm not knowledgeable enough regarding electron to properly pinpoint the issue

BigBoyBarney commented 1 year ago

I'll add on to this. Some links (specifically youtube CHANNELS) cannot be added as source for some reason. I don't even get an error toast, just straight up nothing happens haha

Attempting to add https://www.youtube.com/@ccmusicc (or any other channel link) will results in nothing happening. Youtube VIDEOS on the other hand can be added without an issue. Knowledge doesn't allow us to manually edit the link target, it's impossible to add links to channels

RobRoyce commented 1 year ago

I'll add on to this. Some links (specifically youtube CHANNELS) cannot be added as source for some reason. I don't even get an error toast, just straight up nothing happens haha

Attempting to add https://www.youtube.com/@ccmusicc (or any other channel link) will results in nothing happening. Youtube VIDEOS on the other hand can be added without an issue. Knowledge doesn't allow us to manually edit the link target, it's impossible to add links to channels

Interesting, this actually worked for me:

image

RobRoyce commented 1 year ago

Thank you all for the analysis. I am aware of this issue but haven't had time to deep dive.

One quick solution is to alert the user when an import fails and allow you to "Import Anyways". This will likely result in a Source without icons/thumbnails/metadata, etc., but at least the link will be imported.

Another solution is to load the page in a separate window/view (which will behave differently than just using a GET and then have a Confirm Import button).

I might implement the quick solution for next release and think on a more longterm solution for later.