AdguardTeam / AdGuardHome

Network-wide ads & trackers blocking DNS server
https://adguard.com/adguard-home.html
GNU General Public License v3.0
24.68k stars 1.79k forks source link

Error: control/filtering/set_url | scanning filter contents: bufio.Scanner: token too long | 400 on blocklist update #6003

Closed ppfeufer closed 1 year ago

ppfeufer commented 1 year ago

Prerequisites

Platform (OS and CPU architecture)

Linux/ARM64

Installation

GitHub releases or script from README

Setup

On one machine

AdGuard Home version

v0.107.34

Action

Trying to update my blocklist via the UI.

Expected result

Blocklist updating successfully.

Actual result

Error: control/filtering/set_url | scanning filter contents: bufio.Scanner: token too long | 400

image

Additional information and/or screenshots

This is a blocklist I have been using for a long time, and after today's update, I noticed that it is mentioned with 0 entries. image

So I tried to update it manually by editing and saving, which resulted in this error message.

Blocklist URL: https://raw.githubusercontent.com/ppfeufer/adguard-filter-list/master/blocklist

ppfeufer commented 1 year ago

Verbose log:

023/07/12 20:19:55.609026 4179874#71 [debug] started POST 138.201.77.133:8100 /control/filtering/set_url
2023/07/12 20:19:55.609319 4179874#71 [debug] filtering: set name to "[GitHub] ppfeufer/adguard-filter-list", url to https://raw.githubusercontent.com/ppfeufer/adguard-filter-list/master/blocklist, enabled to true for filter https://github.com/ppfeufer/adguard-filter-list/blob/master/blocklist?raw=true
2023/07/12 20:19:55.609540 4179874#71 [debug] filtering: downloading update for filter 1642338271 from "https://raw.githubusercontent.com/ppfeufer/adguard-filter-list/master/blocklist"
2023/07/12 20:19:55.609749 4179874#64 [debug] home: customdial: dialing addr "raw.githubusercontent.com:443" for network tcp
2023/07/12 20:19:55.609877 4179874#82 [debug] dnsproxy: cache: serving cached response
2023/07/12 20:19:55.609980 4179874#81 [debug] dnsproxy: cache: serving cached response
2023/07/12 20:19:55.610162 4179874#64 [debug] dnsServer.Resolve: "raw.githubusercontent.com": [{185.199.108.133 } {185.199.109.133 } {185.199.110.133 } {185.199.111.133 } {2606:50c0:8000::154 } {2606:50c0:8001::154 } {2606:50c0:8002::154 } {2606:50c0:8003::154 }]
2023/07/12 20:19:55.640904 4179874#71 [debug] filtering: filter 1642338271 from url "https://raw.githubusercontent.com/ppfeufer/adguard-filter-list/master/blocklist" has no changes, skipping
2023/07/12 20:19:55.641207 4179874#71 [error] POST 138.201.77.133:8100 /control/filtering/set_url: scanning filter contents: bufio.Scanner: token too long
2023/07/12 20:19:55.641338 4179874#71 [debug] finished POST 138.201.77.133:8100 /control/filtering/set_url in 32.290566ms
ppfeufer commented 1 year ago

And when trying to add it as new blocklist: image

2023/07/12 20:22:53.763555 4179874#132 [debug] started POST 138.201.77.133:8100 /control/filtering/add_url
2023/07/12 20:22:53.763801 4179874#132 [debug] filtering: downloading update for filter 1689185952 from "https://raw.githubusercontent.com/ppfeufer/adguard-filter-list/master/blocklist"
2023/07/12 20:22:53.764040 4179874#134 [debug] home: customdial: dialing addr "raw.githubusercontent.com:443" for network tcp
2023/07/12 20:22:53.764157 4179874#135 [debug] dnsproxy: cache: serving cached response
2023/07/12 20:22:53.764252 4179874#136 [debug] dnsproxy: cache: serving cached response
2023/07/12 20:22:53.764315 4179874#134 [debug] dnsServer.Resolve: "raw.githubusercontent.com": [{185.199.108.133 } {185.199.109.133 } {185.199.110.133 } {185.199.111.133 } {2606:50c0:8000::154 } {2606:50c0:8001::154 } {2606:50c0:8002::154 } {2606:50c0:8003::154 }]
2023/07/12 20:22:53.796294 4179874#132 [debug] filtering: filter 1689185952 from url "https://raw.githubusercontent.com/ppfeufer/adguard-filter-list/master/blocklist" has no changes, skipping
2023/07/12 20:22:53.796391 4179874#132 [error] filtering: os.Chtimes(): chtimes /opt/AdGuardHome/data/filters/1689185952.txt: no such file or directory
2023/07/12 20:22:53.796585 4179874#132 [error] POST 138.201.77.133:8100 /control/filtering/add_url: Couldn't fetch filter from URL "https://raw.githubusercontent.com/ppfeufer/adguard-filter-list/master/blocklist": scanning filter contents: bufio.Scanner: token too long
2023/07/12 20:22:53.796629 4179874#132 [debug] finished POST 138.201.77.133:8100 /control/filtering/add_url in 33.089971ms
ainar-g commented 1 year ago

Thanks for the report. We've introduced an optimization that limits the RAM consumed by the update check by limiting the length of a single rule to 1024 bytes, and it seems like your list has 66 rules longer than that:

grep -e '^.\{1024,\}' -- ./blocklist | wc

Moreover, neither of these rules seem to be DNS rules, mostly being content-blocking rules. You can filter them out with a script like:

sed '/^.\{1024,\}/d' ./blocklist > ./blocklist_dns
ppfeufer commented 1 year ago

Ah, I see. I'll try that.

ppfeufer commented 1 year ago

Success!

After tweaking the transformation option of my hostlist-compiler settings it's all working again. Thanks for the quick answer and the hint!

mphin commented 1 year ago

感谢您的报告。我们引入了一项优化,通过将单个规则的长度限制为 1024 字节来限制更新检查消耗的 RAM,您的列表似乎有 66 条规则比这长:

grep -e '^.\{1024,\}' -- ./blocklist | wc

此外,这些规则似乎都不是DNS规则,主要是内容阻止规则。您可以使用如下脚本过滤掉它们:

sed '/^.\{1024,\}/d' ./blocklist > ./blocklist_dns

Since updating to v0.107.34, I have encountered this error. I subscribed to someone else's rules, so what should I do?

monsm commented 1 year ago

Error: control/filtering/add_url | Couldn't fetch filter from URL "https://raw.gitmirror.com/monsm/XXKiller/main/x.txt": line at index 44290: character at index 91: non-printable character | 400 @ainar-g what should I do

ppfeufer commented 1 year ago

Ask the maintainer of that list to use HostListCompiler and apply the Validate transformation filter, that's the easiest way to generate compatible lists and what fixed my issue.

Example: https://github.com/ppfeufer/adguard-filter-list/blob/master/hostlist-compiler-config.json

ppfeufer commented 1 year ago

Since quite a number of filter lists are used with both, AdGuardHome and ad-blocker extensions for browsers (µblock, Adguard, etc.), I guess we'll see this issue popping up for a number of these lists.

mphin commented 1 year ago

Thank you, it seems that the rule maintainer can only make the changes.

monsm commented 1 year ago

@ppfeufer Help me see how to implement it with the HostListCompiler,https://github.com/monsm/XXKiller/blob/mae/RMaker/make.cmd

ppfeufer commented 1 year ago

All can be found here » https://github.com/ppfeufer/adguard-filter-list

monsm commented 1 year ago

@ppfeufer Please check my revision to see if there are any mistakes,thinks https://[raw.githubusercontent.com/monsm/XXKiller/mae/.github/workflows/xxkiller.yml](https://raw.githubusercontent.com/monsm/XXKiller/mae/.github/workflows/xxkiller.yml) https://[raw.githubusercontent.com/monsm/XXKiller/mae/RMaker/make.cmd](https://raw.githubusercontent.com/monsm/XXKiller/mae/RMaker/make.cmd)

ppfeufer commented 1 year ago

This is beyond the scope and topic of this issue.

How to use the HostListCompiler is well explained in their repository (https://github.com/AdguardTeam/HostlistCompiler). Please have a look there.

ainar-g commented 1 year ago

Upon reinspecting the code, I think we can actually allow larger lines without losing the optimization for the most common case. We can also improve the error message as well. I'm going to reopen the issue now and commit a fix soon.

ainar-g commented 1 year ago

The line-length limit has been relaxed, and the error message now includes the character in question:

line 66499: character 92: non-printable character '\u200c'
monsm commented 1 year ago

The line-length limit has been relaxed, and the error message now includes the character in question:

line 66499: character 92: non-printable character '\u200c'

could the adguardHome auto fix the error,auto delete line

ainar-g commented 1 year ago

@monsm, from what I understand, the error is there to prevent users from putting e.g. binary files instead of text ones. There is a similar check against HTML text too. What kind of error are you getting? Perhaps the check could be relaxed.

monsm commented 1 year ago

@monsm, from what I understand, the error is there to prevent users from putting e.g. binary files instead of text ones. There is a similar check against HTML text too. What kind of error are you getting? Perhaps the check could be relaxed.

zwnj & zwsp error in rules,But I don't know how to remove the unsupported lines from the rules

monsm commented 1 year ago

@ainar-g Does this submission make a relaxed judgment about zwnj, zwsp, or other special characters? What is the 1024 byte length limit now? https://github.com/AdguardTeam/AdGuardHome/commit/2adc8624c0bd589a9efab564297bab77dde17ac8

ainar-g commented 1 year ago

@monsm, yes, and we have added test cases for that to make sure that they keep working. The hard line-length limit has been returned to 64 KiB.

fbaijnauth commented 1 year ago

Hello Will there be a fix for this issue? I am receiving "Error: control/filtering/set_url | scanning filter contents: bufio.Scanner: token too long | 400" when trying to access the following filter https://raw.githubusercontent.com/AdguardTeam/AdguardFilters/master/MobileFilter/sections/specific_app.txt

ainar-g commented 1 year ago

@fbaijnauth, please read above. The fix is already on the Edge channel. The README has instructions on testing the Edge and Beta versions. (Do not forget to backup your configuration.)

fbaijnauth commented 1 year ago

thank you

Jefffish09 commented 1 year ago

@ainar-g May I ask, when will the stable version of v0.107.35 be released?

ainar-g commented 1 year ago

@Jefffish09, about 15 minutes ago, heh.