Closed dhowe closed 7 years ago
List Candidate: https://github.com/jasonqng/chinese-keywords (more about the list:https://citizenlab.org/2014/12/repository-censored-sensitive-chinese-keywords-13-lists-9054-terms/)
Yes, when I checked a few weeks ago, I also came across this one. Problems: a) there is a lot of data besides the words themselves, and b) it hasn't been updated in over 2 years.
But if you can extract a set of english words that makes sense to use, we can figure out somewhere to host the list
I would suggest that we start from using the "no-dummy-vars-for-categories-and-themes_only-sensitive-words.csv" of this source. The best thing about this list is that it is fully translated into English. Gretfire.org has an ongoing list of sensitive keywords on Weibo, but there is no English translation. Do we only need English of the keywords? Or both Chinese and English (which I think makes more sense...)?
Chinese and English would be ideal, perhaps in pairs? Lets start with the csv list, perhaps using a very simple format like below (note that we may not have English or Chinese for a given phrase):
Chinese phrase 1, English phrase 1
Chinese phrase 2, English phrase 2
Chinese phrase 3,
Chinese phrase 4, English phrase 4
, English phrase 5
Chinese phrase 6, English phrase 6
Please refer to the following commits for the following tasks 1.load best list from some URL on first install https://github.com/dhowe/ChinaEye/commit/f9d588bf3a9d5da3c2d27902d19e314682996fb8 3.add additional search engines https://github.com/dhowe/ChinaEye/commit/8f153d75914cd1550e13705d1b902963e1dc1f87
I just realized that I have the wrong remote origin after I pushed...Please let me know whether it is ok to leave it like this for this time, or I can also revert and make a pull request instead.
As for the list, I currently host it on my own server for testing. Do you want to host it on rednoise?
The way to handle the trigger: I currently processed both the Chinese and English rule to the triggers. Do you prefer to have only the English ones as triggers or both languages? If Chinese is covered in the trigger, I'll add the url decoder for Chinese characters.
Why don't we host it on github? And both languages are fine, shouldn't be much overhead...
the list is now hosted on Github, please check: https://github.com/dhowe/ChinaEye/pull/3
btw, what do you have in your mind about the "periodically check for updates"?Like check for an update of the list every week or so? I currently just let the list reloaded whenever chrome starts...
ublock checks every 4 days... but for now, lets store the time of the last check and do an update check whenever chrome starts, unless its less than some amount, like 12 hours
update function: https://github.com/dhowe/ChinaEye/pull/4