Open qurbat opened 2 years ago
@deepseagirl could you merge this after review?
@deepseagirl hi, just sending a ping on this. thanks!
@deepseagirl Can we close this?
hi, thanks. this is a good improvement :) i moved the unescape to only occur on the result descriptions directly with a flag to toggle the behavior on/off
new default will be to decode character references:
$ python3 degoogle.py "intitle:⟿ inurl:⟿"
-- 9 results --
TranslingualEdit - Wiktionary
https://en.wiktionary.org/wiki/%E2%9F%BF
Talk:⟿ - Wiktionary
https://en.wiktionary.org/wiki/Talk:%E2%9F%BF
flag to turn decoding off:
$ python3 degoogle.py -d "intitle:⟿ inurl:⟿"
-- 9 results --
TranslingualEdit - Wiktionary
https://en.wiktionary.org/wiki/%E2%9F%BF
Talk:⟿ - Wiktionary
https://en.wiktionary.org/wiki/Talk:%E2%9F%BF
the html.unescape python doc links to this list of named character references which seemed handy. i didn't realize char references were such an in depth thing until now. if you're interested here is that link https://html.spec.whatwg.org/multipage/named-characters.html#named-character-references
i'll finalize this when i have a few more mins. should be soon now that it's this far along. thanks again
@deepseagirl no worries, and I realize you were not able to access a computer earlier, so it is no problem. the new changes look great! thank you & tc =)
@deepseagirl can we close?
This change introduces support for search results containing non-Latin characters as part of the URL or description.
This is done by passing the
final_string
variable to thehtml.unescape()
function (instead of printing it directly) at the lastprint
call.