anticensority / discussions

Обсуждение проектов «Антицензорити» и «Антизапрет»
The Unlicense
7 stars 0 forks source link

Challenges by "Antizapret" and "Anticensority" | Призывы «Антизапрет» и «Антицензорити» #1

Open ilyaigpetrov opened 5 years ago

ilyaigpetrov commented 5 years ago

https://rebrand.ly/ac-challenges | https://git.io/ac-challenges

Русская версия ниже и обновляется во вторую очередь.

1. Size Of The PAC-Script "Antizapret"

PAC-script "Antizapret" is used not only in our browser extension but also via a command line argument to chrome or in Windows system settings. In these modes PAC-script can't exceed 1MB and the project author employed lots of compression techniques already however it is not enough. The offered challenges:

  1. Apply lz-string and base128 to compress a script: https://github.com/pieroxy/lz-string/issues/135.
  2. Learn how to classify censored addresses from https://github.com/zapret-info/z-i/blob/master/dump.csv into categories (machine learning) and then we will be able to publish/generate PAC-scripts targeted for specific categories. If user don't need all the categories then he will fit 1MB or he may speed up the script. Also providers of public proxies may block unwanted categories from proxying.
  3. Bloom filter: yet we ourselves try to research this (memory consumption, performance, probability of false positives) and experiment. We can't proxy all addresses (errors) on our proxy servers so we need to measure error distance on the server and decide if it exceeds a chosen threshold. (My old code and results).

2. Universal PAC-Script Generator

I run my own generator, "Antizapret" uses its own (old sources are here). I have an idea of writing a PAC script generator that my be used: 1) Locally on unix/posix/win. 2) In browser extensions. 3) On servers (CloudFlare workers, NodeJS). 4) Maybe even on web pages.

In this kind of flexible generator you may choose:

1) Sources of input (https://github.com/zapret-info/z-i/blob/master/dump.csv, google spreadsheet, other format, input via ipfs/ipns/https). 2) Output targets (1MB for Chrome's command line, Chromium-based browser extensions, FireFox browser extensions (don't support remote PAC-scripts, need to take a list of addresses and a list of proxies in some format)). 3) Employed algorithm and compression method (Bloom, deterministic, lz-string, binary search). 4) Proxies to be used: local Tor, known proxies with good reputation (Antizapret), provided by user. 5) Publishing targets: save results to local files, upload to github, publish on ipfs/ipns, etc.

This generator also instead of generating PAC-script from inputs may search provided trusted sources for already generated PAC-scripts (local cache, ipfs/ipns, github repo, on some site via https, etc.).

This kind of generator may be used: 1) By users on their local machines in browsers or in command line. 2) Deployed on servers, CloudFlare workers, etc. Results are shared somehow, publicly or privately.

Generator itself and code generated are desired to work in old browsers (Windows XP for generator, Internet Explorer for PAC-script consumers).

3. Browser Extension For FireFox

FireFox doesn't support remote PAC-scripts, PAC-scripts must be built in into the extension but may be passed list of addresses or proxies via messages. The task is to write extension able to consume results of universal PAC script generator from the previous challenge and structure it internally somehow for better speeds. This extension may be abstracted out into something like the idea of Request Kitchen. proxy.onRequest may be used instead of PAC-scripts.

Contact Us

  1. Create an issue on this repo for a public discussion.
  2. antizapret@prostovpn.org (private, email group, multiple participants)
  3. anticensority+owners@googlegroups.com (private, email group, multiple participants)

You may publish your solution anywhere or on https://gist.github.com (comments notifications should work there) and email us.


1. Размер PAC-скрипта «Антизапрет»

PAC-скрипт «Антизапрет» используют не только в расшрении, но и через параметр командой строки хрому. В этом режиме PAC-скрипт не может превышать 1MB, и автор уже задействовал кучу техник сжатий, но этого недостаточно. Призываем решить следующие проблемы:

  1. Применить lz-string и base128 к сжатию скрипта: https://github.com/pieroxy/lz-string/issues/135.
  2. Научиться разбивать https://github.com/zapret-info/z-i/blob/master/dump.csv на категории (machine learning) и тогда мы будем публиковать/генерировать PAC-скрипты для отдельных категорий. Если пользователю не нужны все категории, то он впишется в размер 1MB или сможет ускорить работу скрипта.
  3. Фильтр Блума: мы пока сами пытаемся исследовать (потребление памяти, производительность, вероятность ошибок) и экспериментировать. На серверы мы также должны не проксировать адреса не из реестра, но ошибки с определенным порогом дистанции допустимы. (Мой старый код и результаты).

Как с нами связаться

Связаться с нами можно через электропочту на https://antizapret.prostovpn.org или на https://rebrand.ly/ac-contact (оба адреса — почтовые группы с несколькими участниками). Создать публичное обсуждение можно в https://github.com/anticensority/discussions/issues или для обсуждения конкретно своего решения можно создать статью на https://gist.github.com (комментарии теперь имеют уведомления).

ilyaigpetrov commented 5 years ago

Our ideas frequently get updates: we may become disappointed in some ideas or come up with some new ones. In case you decide to start working on some challenge --, please, contact us first.

Here are some updates:

1) Bloom filter -- we are warned that 1MB restriction may deprive this algo of all it's effectiveness. 2) Universal PAC-script generator -- this generator being written in JS will be consuming about 350MB of RAM (being using streams) so maybe using it on a client or CloudFlare workers isn't the best idea. 3) Browser extension for FireFox -- sometime I find time and commit updates here. If you wonder how to get it running -- contact us/me. 4) There is an idea of conveying information whether hostname is blocked via DNS responses to special queries on a specially configured DNS-server. So another challenge is to write/configure such DNS-server (the inspiration for the idea comes from DNSBL).