Closed ameshkov closed 9 months ago
@105th yes, that was done intentionally so that we could check that the implementation is able to handle the case where there's no new line in the end of file.
From my experience so far, the timestamp
is no longer something I see as needed, it was when I thought patching would be done recursively, but it turned out this didn't work in practice -- best is to patch once, and to ensure this all existing patches a re-generated server-side once a new patch is created. This has the benefit of requiring a single fetch to bring all lists up to date even if the extension had not been running for a few days.
Now regarding ! Diff-Expires:
, it tells us when to look again for a new patch, this means we need to remember when was the last patch successfully applied, and thus yet-another piece of information to store client-side. To avoid such need, I decided to encode the time of the patch in the name of the patch, and this works rather well in conjunction with ! Diff-Expires:
: No need to store anything more client-side, and the code handling the logic on whether a patch is available or not just derive creation time from the name, adds ! Diff-Expires:
and see if the time is in the future.
The timestamp
value would accomplish this, but then it's a value to be stored, which I prefer to avoid. What is already stored is ! Diff-Path:
, and if we can also store the timestamp in there, then we have all that is needed to get the patch creation time and the name of the list to patch.
@gorhill
Well, removing timestamp is not a problem. Generally, I think that the diff
directive fields can be extended by any ad blocker the way they see fit, that's why the fields are named (I'd better mention this fact in the spec).
Regarding your approach with encoding additional information in Diff-Path
, in my opinion it's too specific to be a part of the generic spec.
it tells us when to look again for a new patch, this means we need to remember when was the last patch successfully applied, and thus yet-another piece of information to store client-side
Tbh, I don't really understand the need to avoid storing and, what's more important, generating it on the client-side. When you generate it on the client-side, you are 100% sure that it's correct. Many things can go wrong on the server, you'll need to take possible unexpected input into account (the simplest example would be having date in the future). Finally, it ties your own hands: you'll have to generate patches exactly this way and you won't be able to change anything on the server-side once you deploy this version to the users (eventually, you will, but doing it without hurting users requires time and effort).
All that said, what do you think about standardizing Last modified:
and possibly other important fields of a filter list (Title
, Homepage
, etc)?
I think the fact that there's no place on the Internet that defines what kind of fields can be actually used by maintainers does not help. And it's rather easy to fix just by providing the universally supported guidelines (which we all kind of already do, but never wrote it down).
It's difficult to discuss with "in theory" scenarios, so let's just be very specific about what's going through my mind. Here is the code that avoid fetching unnecessarily: https://github.com/gorhill/uBlock/blob/1.53.3rc0/src/js/diff-updater.js#L237-L241
That code derives patch time creation from patch name, then add ! Diff-Expires:
to get a timestamp value, and if that timestamp is beyond now, no fetch is performed. That code must exist, avoiding pointless fetches to a remote server is important.
I think you are arguing that the patch creation time should be associated and stored with each asset to patch, and that the patch creation time is provided in timestamp
field? (which would be the same value for all sub-patch in a batch patch file.) If not provided than make up one by assuming the time at which the patch was fetched is the creation time?
Many things can go wrong on the server, you'll need to take possible unexpected input into account
This already happened and it's really not an issue, it's pretty simple client-side, at the first unexpected outcome for any required step, diff-updater does nothing and in the end all fall back to the usual auto-update with ! Expires
field.
That code must exist, avoiding pointless fetches to a remote server is important.
Absolutely, that's why Diff-Expires
is there: "don't go for new patches until the old one expires".
With batch updates the situation is a little bit more complicated as different lists may have different expiration.
Now that I am thinking about it I got another idea, we could make the patch itself the source of all necessary information.
What if we add a new directive (single directive for the whole patch file regardless of how many ) that encodes this? We can then drop Diff-Expires
, it won't be needed anymore. I think you even mentioned something like that before, right?
Something like this:
head expires:10600 timestamp:1699599870000
I think you are arguing that the patch creation time should be associated and stored with each asset to patch
I had two points actually:
My first point is that it's safer to generate the last-updated timestamp on the client-side because of the possible mistakes that could happen on the server-side. Talking about your code, two scenarios:
Is that all a major issue? Not at all, I'd say that it's a matter of preference, but with a client-side generated timestamp you're protected from these scenarios.
head expires:10600 timestamp:1699599870000
What happens the first time you fetch a list? How do you know when to make a hopefully successful fetch for a patch? You have the Diff-Path
field, but not yet the future time at which to look. This is where encoding the time in the name comes to the rescue. Currently I use 4 fields, but it could be a single one, in minutes since epoch (i.e. whatever.28327261.patch
) or 5-minute chunk since epoch (i.e. whatever.2360605.patch
), and if you want to get rid of Diff-Expires
, then dot-separated values in the name, i.e. whatever.2360605.72.patch
.
Clearly where we diverge is you have issue with encoding that information in the name, whereas I do not see this as an issue: if it's in the spec, why worry about changing this in the future? What could ever happen that this becomes a burden? If the spec says this is the way, then whoever want to properly implement diff-updating their list will have to just follow the spec or use available tools that does all this for them.
@gorhill
What happens the first time you fetch a list? How do you know when to make a hopefully successful fetch for a patch?
Good questions.
It just bothers me that both the expiration period and the timestamp are properties of the patch file and they should be the same in every list that refers to that patch file.
Let me put aside personal preference and analyze advantages and disadvantages, please correct me if I am wrong anywhere.
One important detail: if we start encoding information in the file name, I think we should encode the expiration period there too. As I said, I think it's a property of the patch file, just like the timestamp, so it'd be strange to store one here and one there.
Advantages of encoding ALL the information in the file name:
Diff-Expires
.I wouldn't list saving on fields here as an advantage since nothing prevents you from concatenating different fields in your code and having the same result.
Disadvantages:
Diff-Path-V2
or similar to avoid compatibility issues.All in all, I'd say that the disadvantage is not that serious since there's still a way to upgrade the spec and avoid issues.
Here's what I propose then:
${patchName}-${diffExpires}-${epochTimestamp}
. It is not human-readable, but I don't think that it's a big issue to be honest, you'll still have the updated Last modified
inside the patch file.UPD I read this and realized that I basically repeat what you proposed. Minutes since epoch is also okay I guess.
${patchName}-${diffExpires}-${epochTimestamp}
It's me nitpicking, personally I prefer the expire value to come after the creation time, it reads more naturally in my opinion: this patch was created on epochTimestamp
and expires in diffExpires
.
Minutes since epoch is also okay I guess.
Could even be chunk of 5, 10 or 15, i.e. whatever resolution is most certainly the minimum ever needed. This would shorten a bit the string. Resolution could be even left to the patch emitter:
${patchName}.${resolutionInMinutes}-${epochTimestamp}-${diffExpires}.patch
Where both epochTimestamp
and diffExpires
must be multiplied by resolutionInMinutes
. I can imagine a patch emitter not caring about emitting at hours interval only:
thelist.60-472216-6.patch
Whereas implicitly in minutes this would be:
thelist.28332968-360.patch
Please check the updated spec.
@gorhill I've incorporated most of what we talked about.
Notable changes:
resolution
- I suggest making it optional and allowing to specify just these three units: seconds, minutes, hours, where hours is the default resolution. This is the "shortest" resolution available and tbh I believe that most of the lists will be just okay with 1 hour expiration period, i.e. list-472234-1.patch
-
as the only delimiter so that .
could be used in the patchName
.
See the discussion here: https://github.com/AdguardTeam/FiltersCompiler/issues/192#issuecomment-1792291460
Updated the spec according to the discussion: