eafer / rdrview

Firefox Reader View as a command line tool
Apache License 2.0
836 stars 35 forks source link

Non-english links and images are broken #23

Closed asakura42 closed 2 years ago

asakura42 commented 2 years ago

Try it yourself:

rdrview "https://en.wikipedia.org/wiki/Wikipedia" and rdrview "https://ja.wikipedia.org/wiki/ウィキペディア". In first case you can see images and go through links. But in second case all links became "file:///..." so you can't open them or watch images.

I don't know what exactly this problem is, so sorry for possible mistake in the title.

eafer commented 2 years ago

The problem here is that

https://ja.wikipedia.org/wiki/ウィキペディア

is not actually a valid url, and libxml2 rejects it. If you copy-paste the wikipedia url from the address bar, or from an actual link inside the browser, you can see the real url, which is

https://ja.wikipedia.org/wiki/%E3%82%A6%E3%82%A3%E3%82%AD%E3%83%9A%E3%83%87%E3%82%A3%E3%82%A2

I just pushed a somewhat hacky solution to get this to work, let me know if it helps.

asakura42 commented 2 years ago

Nice! It works!