Closed scripthunter7 closed 1 year ago
There has been a more or less known problem for some years that following the list conversion instructions at https://github.com/DandelionSprout/adfilt/tree/master/NorwegianExperimentalList%20alternate%20versions#how-to-run-the-list-generation-script give dramatically different newline handling depending on whether the instructions were run with Cygwin or PowerShell, with Cygwin being recommended specifically due to the "more changes than originally planned" problem.
Since both PowerShell and VSCode are Microsoft products and stick to Microsoft's newlines, while most other tools seem to stick to Unix newlines (e.g. Cygwin, Sublime Text 4), I don't think this'll be nearly as easy to fix as it may seem like.
You can add the following to .gitattributes:
https://github.com/hagezi/dns-blocklists/blob/main/.gitattributes
# Auto detect text files and perform LF normalization
text=auto eol=lf
This protect files before these malfromation after edit only 1 line in simple web CodeMirror editor:
https://github.com/FiltersHeroes/KAD/commit/0568c165bf9971b86f9987264f4c30223a65a59a https://github.com/FiltersHeroes/KAD/commit/0568c165bf9971b86f9987264f4c30223a65a59a.patch
?
I don't know if MS is already working on fixes to make editing huge files work properly (A few months ago, it was not achievable to edit when the file had about 25,000 lines) in this simple editor (they certainly fixed the speed and I can already safely use regexes for searching). So if that doesn't fix it MS globally or the declaration in gitattributes then I'm left using github.dev or a local editor and uploading changes to the server.
@scripthunter7 Can you check if the newlines work better now, after I added Hagezi's recommendations some days ago?
0 changes:
Maybe neededed is rebuild files via global github action.
Okay, I think I found the problem hiding in plain sight: My converter scripts had been using text += line + '\r\n'
for years now.
I'll see if removing /r
fixes things. Give me 5min.
@scripthunter7 Can you check if the newlines work better now, after I added Hagezi's recommendations some days ago?
That will only be applied to files that were recently updated and committed.
I think I've been able to fix the problem fully now. I'm awaiting confirmation of it.
updated and committed.
Updated 19 hours ago via upload: https://github.com/DandelionSprout/adfilt/commit/7789eb33684cf62662a3c8a41a5d771625d68d17
So needed is use git in command line?
For me it shows it was updated 4 minutes ago: https://github.com/DandelionSprout/adfilt/commit/55db7350df42f544f5107ae4f529fc26d8308759
Me too, trying enough how we define commited and updated.
As commited is different from via upload then MS should improve the gitattributes system to consider the option to drag and drop a file as "commited".
Drag and drop/file upload just uses the line endings it was uploaded with. You need git commit
or GH UI to do that.
It seems to me that some python scripts still use CRLF in some places:
Tbh, I didn't look deeper into what these scripts do, I just searched for the CRLF pattern.
.gitattributes
seems fine to me, however, already existing files should be normalized.
Reference: https://docs.github.com/en/get-started/getting-started-with-git/configuring-git-to-handle-line-endings https://git-scm.com/docs/gitattributes#_end_of_line_conversion
Currently, these files contains mixed newlines (CRLF and LF at the same time):
And these files contains CRLF (fully CRLF or mixed):
Well, I converted everything to LF, but it resulted in a PR of ~150k lines: https://github.com/DandelionSprout/adfilt/pull/833 (this not only normalizes mixed newlines, but also makes files in the repo consistently use LF). As far as I can see, the previous commit also had ~25k lines. But for my part, I don't want to leave such a big change in the repo history, so I closed this PR instead
It seems to me that some python scripts still use CRLF in some places:
That's mostly because I didn't edit those lists' conversion scripts just yet. I'll get it done this weekend.
Commit as ghost still exist: https://github.com/DandelionSprout/adfilt/commit/6dc3abac7689002945f357a73dab6c8d9d89d542 perhaps it is possible to generate with this the problems the Energy list had - MS told them to remove the repository history or they won't come back.
Anyway, I simply converted the files, but I think the change will be huge even if the build scripts are changed
With the filesize of https://github.com/DandelionSprout/adfilt/blob/master/AdGuard%20Home%20Compilation%20List/AdGuardHomeCompilationList.txt being inexplicably reduced from circa 945 KB to 900 KB almost solely by removing /r
, my current hasardous guess is that it will pay off regardless.
the problems the Energy list had - MS told them to remove the repository history or they won't come back.
Energized also had lists with several million entries, and were previously (2018~21) far more popular than their poor maintenance record would've indicated, likely pushing GitHub's dataservers. So that didn't exactly make MS look at them any milder either.
In so far as I can currently tell, all affected lists (except MetricRecipe.py
, whose generated files are never intended for publication) have now been converted to non-/r
with no known long-lasting issues resulting form it, effectively fixing this issue report. Comments will remain open.
It seems that many filter lists have inconsistent (mixed) newlines, which may be a problem, as many editors don't handle this well. For example, VSCode will consolidate newlines on save (i.e. only one kind of newline will be used after save). If the contributor is not careful enough, his commit will contain significantly more changes than originally planned, since in this case the newlines are also replaced in lines where the contributor didn't make any changes originally.
If possible, please consider standardising the newlines.
Example:
https://github.com/DandelionSprout/adfilt/blob/master/NorwegianExperimentalList%20alternate%20versions/NordicFiltersABP-Inclusion.txt