RequestPolicyContinued / requestpolicy

a web browser extension that gives you control over cross-site requests. Available for XUL/XPCOM-based browsers.
https://github.com/RequestPolicyContinued/requestpolicy/wiki
Other
253 stars 35 forks source link

rules storage: save one rule per line into the json file #339

Open msxfm opened 10 years ago

msxfm commented 10 years ago

Issue by drzraf Saturday Sep 08, 2012 at 23:16 GMT Originally opened as https://github.com/RequestPolicy/requestpolicy/issues/339


That would be great if the black/white-listing were git-friendly by using a separate, sorted, ini-file. Currently they are unmanageable and kind of clutter prefs.js.

msxfm commented 10 years ago

Comment by drzraf Thursday Jan 16, 2014 at 14:34 GMT


still interested ? I could provide a patch storing _allowedOrigins, _allowedDestinations and _allowedOriginsToDestinations in a file (named "requestPolicy.cnf") whose format could be :

allowedOrigins {
X
Y
Z
# one per line, # for comments
}, # with a comma or not ?
allowedOrigins {
# same a above
}, # same question as above
allowedOriginsToDestinations {
A
B
 B'
 B''
 B'''
C
}

The later being equivalent to the current pref allowedOriginsToDestinations = "A B|B' B|B'' B|B''' C"

That way one could easily put this file under git while versioning its home (and why not configure this file location through a firefox prefs directive (like adblockplus does)) and this will lighten the prefs.js file.

In my case after roughly 3 years, my allowedOriginsToDestinations directive contains 976 entries which, in prefs.js, represent a 26 k chars long line.

msxfm commented 10 years ago

Comment by drzraf Thursday Jan 16, 2014 at 14:37 GMT


An other possibility is to use an ini-file based syntax like

[allowedOrigins]
X
Y
Z
[allowedOriginsToDestinations]
A
B

The main thing I'm unsure is about using a trailing-blank dependant file-fomat (a la python-programming), without a trailing-comma for line endings. I'd prefer to avoid comma and use newline as an entry delimiter. But then what could be the best and optimized way to represent the "allowedOriginsToDestinations" nested structure while keeping a simple parser.

myrdd commented 9 years ago

@drzraf did you still use RequestPolicy? Did you take already a look at v1.0 beta? v1.0 stores rules in a json file. Unfortunately, there's currently a bug about importing rules from v0.5, see https://github.com/RequestPolicyContinued/requestpolicy/issues/354

drzraf commented 9 years ago

Of course I still use it heavily and having accumulated a whilelist of 1400 Origin<->Dest thus I'd not upgrade without them being conserved.

Granted that versioning Mozilla software is a dead-end because they simply sux at keeping things simple. Anyway I found to be able to split and script part of my firefox configuration in order to get it tracked by git and more specifically version my rules (still firefox prefs.js needs to be cluttered with this very-huge line that represents the 1400 combinations...)

[offtopic] Each time FF is close, a VCS-friendly text file named allowedOriginsToDestinations.log is generated from the prefs.js (of the various profiles I've). All the profiles are then updated using the rules of this file. This let me synchronise rules among several profile and keep rules under versioning.

To be exhaustive, the same script which generate the text file also concatenate pref() directives from a prefs.d/*.js directory into user.js. One of these prefs is "autoadmin.global_config_url" which allow a JS script to customize (a bit) the prefs according to things like getenv() & co. [/offtopic]

myrdd commented 9 years ago

@drzraf wow, so you've got an interesting workaround for the problem. However, could you please take a look at this example json file which will be the format in 1.* versions? You've suggested a *.ini file instead; would be json ok for you? Sadly json doesn't support comments, but it's possible to simply save comments into json fields itself ({comment: "my comment…"}). What do you think?

drzraf commented 9 years ago

I think that JSON is fine a long a couple of VCS-friendly things are kept in mind: 1) its possible to add a json_pp git-filter to pretty-format the file when the diff is visualized.

2) its possible to order alphabetically the prettified json-diff result to keep the visual diff tiny but to reduce the need of commits, the json generated by the extension must avoid unnecessary reordering.

I don't personally need comments, but it may be a good idea.

The path of the JSON file should be configurable with a pref(), but if it's not possible or easily implementable, then if the file is a symlink, it should be preserved (if the file is fully rewritten, the symlink could be followed to rewrite the destination file).

About subscription, I feel that a need to add additional fields could quickly be felt but:

An alternative could be to "split" policies in distincts files (would imply the ability to load several policy files, eg loading from a policies.d/*.json directory if present) Whether spliting files or adding JSON fields or both is, I think, worst the debate.

Migrating existing policies should be straightforward since the new format is more feature-full than the old one but of course the new format needs to be fully defined first to avoid useless work.

my 2c

myrdd commented 9 years ago

Thank you @drzraf for your ideas! I'll re-read your comment as soon as I'm working on the json files. By the way, I could imagine that the user could not only choose the location of the file itself but also the format of that file (e.g. json or ini).

TteokbokkiNari commented 9 years ago

"By the way, I could imagine that the user could not only choose the location of the file itself"

That would be really useful for me. Would save me the trouble of manually syncing the changes I've made on my desktop to my laptop.

drzraf commented 9 years ago

On Wed, Nov 12, 2014 at 05:24:05AM -0800, Daniellynet wrote:

"By the way, I could imagine that the user could not only choose the location of the file itself"

my advise: try to Keep It Simple.

Multiple file formats for config' implies:

Having prefs.d/*.json configuration directory may be worth the hassle (if JSON attributes is restricted) but I wouldn't say so for multiple file formats.

More especially one well-thought file-format (eg "sane JSON formatting" with good defaults) could allow simple & powerful processing to take place outside of the extension (eg: simple perl/python/js/whatever... script)

If rules are "configuration" then I don't know of any software which permit multiple configuration file-formats. But if they are considered as input data, then it's another story and it would seem (a bit) more "legitimate" to allow multiple format.

A last thing: I'm over 1400 rules which weight a bit over 1kB With the new "grouped" format, grouping vs syntax overhead, I feel it could grow-up a bit over 2kB (still acceptable). If a common pool of subscribed lists got attention, it could easily start to be 5 to 10kB of rules in the next months, but reaching the megabyte is only question of time.

Also note that AdblockPlus patterns.ini is often > 1.5 MB

See this instructive bug-report of HTTPS-Everywhere: https://trac.torproject.org/projects/tor/ticket/10174

I quote:

The default ruleset is absolutely massive, over 3MB in XML. HTTPS Everywhere consumes over 30MB of RAM according to WMI and over 10MB according to about:memory. Startup time is noticeably slower.

More specifically about subscriptions:

1) same problem as with Adblock Plus' Easylist and everything else based on community rules that are stored client-side: 99.99% goes unused and only bogs down the client without ever being useful to an individual user.

nodiscc commented 9 years ago

99.99% goes unused and only bogs down the client without ever being useful to an individual user

That's why we should keep subscriptions "modular", I think. The official subscriptions should keep rules to a minimum (see https://github.com/RequestPolicyContinued/subscriptions/pull/6#issuecomment-62394918). Maybe the way to go is to carefully add non-official subscriptions (e.g. pointing to someone else's repository) to the subscription lists, and have them disabled by default (the same way you add Fanboy's lists to adblock).

Having prefs.d/*.json configuration directory may be worth the hassle (if JSON attributes is restricted) but I wouldn't say so for multiple file formats.

I agree.

nodiscc commented 9 years ago

By the way, feel free to discuss what should be/not be included in subscriptions (completeness vs bloat, modularity) in https://github.com/RequestPolicyContinued/subscriptions/issues. This issue is about the rules storage format.

myrdd commented 9 years ago

I think that JSON is fine a long a couple of VCS-friendly things are kept in mind: 1) its possible to add a json_pp git-filter to pretty-format the file when the diff is visualized.

Please describe in more detail what you mean about point 1).

2) its possible to order alphabetically the prettified json-diff result to keep the visual diff tiny but to reduce the need of commits, the json generated by the extension must avoid unnecessary reordering.

This means new rules will directly be in alphabetical order. That would be ok for me.

I don't personally need comments, but it may be a good idea.

I'm quite sure I won't implement comments to rules files. This was just an idea for subscriptions, but in case we need comments for subscriptions, we could have file with comments (the file that will be maintained) wich will be converted to a json file (which will be file to be published). But this is not in the scope of this issue. This is about the rules file format, and that won't have comment support in the near future (IMHO).

Configuring the path of the rules file would be nice, and @Daniellynet would also like such a feature. We just have to be careful about file changes then: what happens when the file gets changed while one (or even several) instances of RP are running and using that file? Having a prefs.d/*.json dir sounds ok to me, but in that case RP needs to support several rule files first. I suggest to discuss this in a different issue.

my advise: try to Keep It Simple.

You are right, at least for now we leave it at the json format.

More especially one well-thought file-format (eg "sane JSON formatting" with good defaults) could allow simple & powerful processing to take place outside of the extension (eg: simple perl/python/js/whatever... script)

several questions: "good defaults"? processing of what?

@drzraf I'll answer to what you've said about subscriptions in the subscriptions repo some other time.

myrdd commented 9 years ago

Btw @drzraf as soon as you agree on the json format of 1.0 we can close this issue.

drzraf commented 9 years ago

On Thu, Nov 13, 2014 at 03:10:29PM -0800, Martin Kimmerle wrote:

I think that JSON is fine a long a couple of VCS-friendly things are kept in mind: 1) its possible to add a json_pp git-filter to pretty-format the file when the diff is visualized.

Please describe in more detail what you mean about point 1).

That json file is as readable as an ini file from a VCS/diff PoV. One great advantage of ini-file format is that it's much more Unix-way "text files should be the preferred interface" than compressed JSON, but... tools to pretty format JSON are pretty easily usable. json_pp perl tool can be hook'ed into git to show rules added/removed/... It's VCS-friendly.

2) its possible to order alphabetically the prettified json-diff result to keep the visual diff tiny but to reduce the need of commits, the json generated by the extension must avoid unnecessary reordering.

This means new rules will directly be in alphabetical order. That would be ok for me.

I think it reduces file changes delta (what is good), and does not imply more (neither less) rewrite of that file on the disk. IHMO rule file should do what it intends to do (contain rules) and only really change on-disks when rules are modified (no [optional] metadata)

Configuring the path of the rules file would be nice, and @Daniellynet would also like such a feature.

maybe another option is not needed if symlink are well-supported (that's the point of symlink). It'll make it impossible on some windows XP machines & co (although I don't know about interop' issues for filesystem accesses inside a FF extension)

We just have to be careful about file changes then: what happens when the file gets changed while one (or even several) instances of RP are running and using that file?

this is a moot-point. It'd be a huge win if this is supported (unlike most firefox config file). Reducing chance of hot-changes (multiple files) seems as a good start. Otherwise, inotify is the way to go, but from a quick look it seems out of scope of a (portable) FF extension.

More especially one well-thought file-format (eg "sane JSON formatting" with good defaults) could allow simple & powerful processing to take place outside of the extension (eg: simple perl/python/js/whatever... script)

several questions: "good defaults"?

I was think about JSON formatting (condensed, ordering, attributes supported, ...)

processing of what?

whatever file transformation one may needs for non-essentials purposes like...

nodiscc commented 9 years ago

Attempt at grabbing important points in this issue:

Please add a short point if you think I missed something.

drzraf commented 9 years ago

On Fri, Nov 14, 2014 at 07:22:14AM -0800, nodiscc wrote:

Attempt at grabbing important points in this issue:

  • do not store user rules in prefs.js.
  • JSON as a storage format is ok.
  • stored JSON files should not be minified/compressed. They should be written with one rule per line (same format as files in the subscriptions repo (easy to store/diff in version control systems)
  • RP should sort rules alphabetically when storing new ones (so that when versioning them in VCSs keeps diffs clean and introduces minimum changes)

Please add a short point if you think I missed something.

perfect, thanks for the much-needed synthesis!

I'd add these points unless they are out of scope:

I hope too I didn't overlooked anything useful for the next... hum... 5 years :)

And two other things:

1) The biggest part of unknown will be how 3rd party requests evolve, as the web evolves (see for example cloudfront subdomains PITA). It may influence rules complexity and thus JSON format/attributes.

2) about sorting / not-sorting:

I think that the (FF ?) JSON writer should be double-checked for one this: If it never internally reorders the elements, neither after reading, neither before writing, then new rules appears at the end and will be written at the end of the file, so changes are kept small without need of reorder.

If this behavior is consistent and sure in the long run (does it depends upon FF internal functions?), then from a "do it less" perspective, it's even better. Anyway if reordering (by host/dest ?) is consistent too, it's not a problem (most users may even prefer this when reading rule files). [ I was just thinking about possible criticisms against "too much automagic" ]

If "not reordering" is a valid option (because file existing order is consistently preserved in the process

read-from-file -> modifify/add/delete rules -> write-to-file I think the question of whether to sort or not could be left open (because reordering multiple files, reordering huge files, ... could cause more troubles)

myrdd commented 9 years ago

Thank you @nodiscc for summarizing the long discussion! @drzraf it would be great if you'd avoid long text comments and instead try to outline your thoughts briefly and clearly. :)

2) about sorting / not-sorting:

I'm very sure that firefox' function to save a javascript array into a json file does no reordering. IMHO no json library should do that without asking. I assume that the main reason you wanted the ordering was to have minimal changes for your version control system, the request about automatic ordering is invalid – not needed anymore. If anyone wants automatically ordered rules, he/she will request that in a separate issue. If you @drzraf definitely want to have ordered rules, please create a new issue.

  • multiple policy files (eg: policies.d/*.json)
  • JSON file naming conventions (eg: sub-.json, user-.json, ... ?)

Please open a new issue for anything not related to the actual internal format of the json file.

so I now comment on @nodiscc's summary:

do not store user rules in prefs.js.

Already done.

JSON as a storage format is ok.

good.

stored JSON files should not be minified/compressed. They should be written with one rule per line (same format as files in the subscriptions repo (easy to store/diff in version control systems)

Is this true @drzraf? Or doesn't this matter for you?

drzraf commented 9 years ago

On Fri, Nov 14, 2014 at 09:41:03AM -0800, Martin Kimmerle wrote:

stored JSON files should not be minified/compressed. They should be written with one rule per line (same format as files in the subscriptions repo (easy to store/diff in version control systems)

Is this true @drzraf? Or doesn't this matter for you?

Yes fully true (so that it stays hand-editable and versionable since VCS work on a line-based diff level)

myrdd commented 9 years ago

Alright, so the goal of this issue is: Save one rule per line into the json file.

drzraf commented 9 years ago

I can't resist referencing: https://bugzilla.mozilla.org/show_bug.cgi?id=1090503

Modularise prefs.js to keep add-on preferences out of the main file

nodiscc commented 8 years ago

For reference there is a third party tool at to generate RPC rules from a multiline (git-trackable) format, hand maintained file: https://github.com/mk-pmb/firefox-requestpolicy-util (ref. website#26)

timkuijsten commented 8 years ago

Just as a side-note: for config like files (not sure if rules fall in this category) I personally found hjson to be much more manageable than ini or json.