Closed nodiscc closed 9 years ago
I think this is ready to merge. See commit details and feel free to ask if you need any precision. This comment https://github.com/RequestPolicyContinued/subscriptions/issues/9#issuecomment-65481767 is also related to future updates of this subscription.
Great work @nodiscc! I'd suggest that you may work directly on the master
branch – just be careful about removing subscriptions as the code depends on them (currently).
In other words, keep on managing subscriptions; you're doing a good job.
Thanks for pointing this out @myrdd I've not searched for duplicates yet. This is an error during moving _itemsByRegion
manually. I still have to find a tool to deduplicate lines.
I added the world's top25 sites in cc0e24d.
take a look at sort and uniq, optionally paired with grep.
I've removed all duplicate rules. The file passes the JSON validator at http://jsonlint.com/ I'll merge this after a coffee break, and keep working on top26-100 websites in another Pull Request.
I've removed all duplicate rules.
Actually I could find only one duplicate line. (see below) ; I was wrong, see my next comment.
I still have to find a tool to deduplicate lines.
I successfully tested the following:
$ egrep -o "\{.*\}" official-allow_sameorg.json | uniq -d
{"o":{"h":"*.uol.com.br"},"d":{"h":"*.imguol.com"}}
This is what I did:
↳ sort official-allow_sameorg.json |uniq -c |sort
[...]
2 {"o":{"h":"*.baidu.com"},"d":{"h":"*.bdstatic.com"}},
2 {"o":{"h":"*.google.com"},"d":{"h":"*.gstatic.com"}},
2 {"o":{"h":"*.tagged.com"},"d":{"h":"*.tagstat.com"}},
2 {"o":{"h":"*.taobao.com"},"d":{"h":"*.taobaocdn.com"}},
2 {"o":{"h":"*.uol.com.br"},"d":{"h":"*.imguol.com"}},
Oh you're right – I forgot to sort
beforehand.
FYI, this is what can be used alternatively:
$ egrep -o "\{.*\}" official-allow_sameorg.json | sort | uniq -d
{"o":{"h":"*.baidu.com"},"d":{"h":"*.bdstatic.com"}}
{"o":{"h":"*.google.com"},"d":{"h":"*.gstatic.com"}}
{"o":{"h":"*.tagged.com"},"d":{"h":"*.tagstat.com"}}
{"o":{"h":"*.taobao.com"},"d":{"h":"*.taobaocdn.com"}}
{"o":{"h":"*.uol.com.br"},"d":{"h":"*.imguol.com"}}
The difference to @nodiscc's version: Besides the -d
parameter (it means that only duplicates are displayed), grep
removes non-rule lines and ensures that this special case is also catched:
[
{"d":{"h":"*.A.com"}},
{"d":{"h":"*.A.com"}}
]
You can test the difference easily:
#!/bin/bash
json='[\n {"d":{"h":"*.A.com"}},\n {"d":{"h":"*.A.com"}}\n]'
echo "# with grep:"
echo -e $json | egrep -o "\{.*\}" | sort | uniq -c
echo
echo "# without grep:"
echo -e $json | sort | uniq -c
output:
# with grep:
2 {"d":{"h":"*.A.com"}}
# without grep:
1 [
1 ]
1 {"d":{"h":"*.A.com"}}
1 {"d":{"h":"*.A.com"}},
@nodiscc what about a simple makefile which collects some simple scripts?
Fixes https://github.com/RequestPolicyContinued/subscriptions/issues/10
Review process:
Differences with the previous subscription:
All origin domains from the previous list work as expected.