Pitmairen / selection-search

Search extension for the chrome web browser
GNU General Public License v3.0
183 stars 26 forks source link

Character encoding options #89

Open artifoxel opened 4 years ago

artifoxel commented 4 years ago

Many online dictionaries use different character encoding formats for the search query. For example:

It would be very useful if was an option to change the character encoding to a different format for a particular search engine.

For reference, a very similar context search addon for Firefox implemented this feature using iconv-lite and browserify:

https://github.com/CanisLupus/swift-selection-search/issues/118#event-3024186340

Thanks for all your hard work on this very customizable extension! 🦊

Pitmairen commented 4 years ago

Not too long ago I added a feature that tries to let the browser fix these encoding issues. It is a little bit hidden, but it works by just appending the literal string {SPECIALENCODING} to the end of the search engine url.

This will trigger a slightly different behavior when doing a search. The search will go via an internal page that constructs a html form that submits the search. This should make the browser use the correct encoding automatically.

If it doen't work you can try e.g: {SPECIALENCODING}EUC-KR

I have done some limited testing and it seems to work for me at least.

The only issue is that you get a small extra delay, and the internal page is displayed for a short while before the search is submitted. But the advantage is that I don't have to include code to do the encoding in the extension.

artifoxel commented 4 years ago

Cool. With the examples, the {SPECIALENCODING} token worked for the 2nd engine:

http://zonmal.com/hanja_sen.asp?se=%s{SPECIALENCODING}

the 1st engine required explicitly specifying the encoding:

http://www.zhongwen.com/cgi-bin/zis.cgi?ju=%s{SPECIALENCODING}BIG5

Does the intermediate html form check the encoding of the target search page? Curious how the encoding is done when the form is sent too.

Thanks a lot! 🦊

Pitmairen commented 4 years ago

It intermediate page does not check the target page encoding, but the browser does.

The form basically looks like this:

<form action="[search-engine-url]" method="get">
<input hidden="[query-parameter]" value="[selected-text]">
</form>

And then the browser usually seems to figure out the correct encoding.

If the encoding is specified in the search url the following attribute is added to the form accept-charset="[encoding]" https://developer.mozilla.org/en-US/docs/Web/HTML/Element/form#attr-accept-charset

artifoxel commented 4 years ago

Ah, very neat. I think all of the functionality is there then.

The only other suggestion would be to have a list of supported character encodings (like this). I tried to look for a list on Mozilla's documentation, but couldn't find any.

Do you know of any such list? Thanks again. 🦊

Pitmairen commented 4 years ago

I think all encodings supported by the browser should work. I found these pages that probably lists all the supported encodings: https://developer.mozilla.org/en-US/docs/Web/API/TextDecoder/encoding https://github.com/chromium/chromium/blob/2ca8c5037021c9d2ecc00b787d58a31ed8fc8bcb/base/i18n/character_encoding.cc#L14

artifoxel commented 4 years ago

Sweet. I think this issue is solved (or rather was already solved :smile_cat:).

Thanks, really appreciate all your help! :fox_face: