Closed spirillen closed 1 year ago
This one is odd ... Who does that ?
user who don't know better: https://github.com/Clefspeare13/pornhosts/issues/60
Have seen that other places as well actually, that why I suggested it in a global "scaled", other times I simply suspect some are using a script to completely headless append the m.
and www.
just to make there lists grow
I was thinking about this, and I'm not sure if it's really in the scope of the SPECIAL rule ...
When I created the SPECIAL rule, it was really just to take things UP
and DOWN
if things are really away or back. It's an extra layer of test.
302 Found
is not something I considered as criteria for taking something DOWN
...
What do you think of that ?
I sometimes think a HTTP code 302
is down, most cases actually, unless it is part of the HSTS (HTTP Strict Transport Security) as the specified target obviously is moved.
Then the HUGE exception.... redirecting spyware like t.co
bit.ly
etc they are all redirecting (Didn't check there response code at they are blocked here)
That's why I suggested this as a special rule, check for a forth level domain and if there is mark it INVALID, in that way we cancircumwent the 302 question and we can't use the --complements
as that is purely for the www or not www
On the other hand if it is a bigger work... and then again, when remembering the exact domain, I've seen the same "rule" could be applied elsewhere.
IF domain-level >= 4
then
rule is INVALID
fi
The question might then become, is this a module we would like to be able to make special rules based on domain level?
NB: as reply to the 302 specific question. 302 + 308 clearly says, don't come back here, there is nothing to see,
you need to go to xyz to see anything while 301+307 is temporary moved
I understand, but this will have some consequences. It's actually not INVALID ... It actually redirects to the right domain ... At least that is what the Location
header is saying.
The browser can't follow it for some obscure reason but it is actually working as-it-should:
$ curl -IL 'http://www.sensual-kiss.tumblr.com'
HTTP/1.1 302 Found
Server: openresty
Date: Sun, 10 Oct 2021 10:35:37 GMT
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
X-Rid: 0ea88a24be0398a789080c4690f3d87a
P3p: CP="Tumblr's privacy policy is available here: https://www.tumblr.com/policy/en/privacy"
X-Frame-Options: deny
X-Xss-Protection: 1; mode=block
X-Content-Type-Options: nosniff
Strict-Transport-Security: max-age=15552001
Location: https://sensual-kiss.tumblr.com/#_=_
X-UA-Compatible: IE=Edge,chrome=1
HTTP/2 200
server: openresty
date: Sun, 10 Oct 2021 10:35:37 GMT
content-type: text/html; charset=UTF-8
vary: Accept-Encoding
vary: Accept-Encoding
x-rid: f0347074f015230059675a339d117709
p3p: CP="Tumblr's privacy policy is available here: https://www.tumblr.com/policy/en/privacy"
x-xss-protection: 1; mode=block
x-content-type-options: nosniff
strict-transport-security: max-age=15552001
x-tumblr-user: sensual-kiss
x-tumblr-pixel-0: https://px.srvcs.tumblr.com/impixu?T=1633862137&J=eyJ0eXBlIjoidXJsIiwidXJsIjoiaHR0cDovL3NlbnN1YWwta2lzcy50dW1ibHIuY29tLyIsInJlcXR5cGUiOjAsInJvdXRlIjoiLyJ9&U=KHHNEPCHIJ&K=3c17a9ae01752bbd52f5c333effe64d0e0ba0b7996b712c6147438227d16a98b--https://px.srvcs.tumblr.com/impixu?T=1633862137&J=eyJ0eXBlIjoicG9zdCIsInVybCI6Imh0dHA6Ly9zZW5zdWFsLWtpc3MudHVtYmxyLmNvbS8iLCJyZXF0eXBlIjowLCJyb3V0ZSI6Ii8iLCJwb3N0cyI6W3sicG9zdGlkIjoiNjUyNTk5NDY3ODg4MDE3NDA4IiwiYmxvZ2lkIjo1MTgxMDYzNTEsInNvdXJjZSI6MzN9LHsi
x-tumblr-pixel-1: cG9zdGlkIjoiNjQ2MTgzNjQ2MTE2NjkxOTY4IiwiYmxvZ2lkIjo1MTgxMDYzNTEsInNvdXJjZSI6MzN9LHsicG9zdGlkIjoiNjQ1NTExODY3MzI1OTIzMzI5IiwiYmxvZ2lkIjo1MTgxMDYzNTEsInNvdXJjZSI6MzN9LHsicG9zdGlkIjoiNjQ0NTIwNzgzMzk2MzcyNDgwIiwiYmxvZ2lkIjo1MTgxMDYzNTEsInNvdXJjZSI6MzN9XX0=&U=JNBPKHMDFF&K=7bdd23e4b63545c561b864f08fb2ef49cc3394bc9338b16d83272730f79d06e6
x-tumblr-pixel: 2
link: <https://64.media.tumblr.com/c734fc3e754e30ec2711f1e34829e448/e35d615ef95041c4-89/s128x128u_c1/5eeae975e3ba6d53334dca994719fbc8a57d7537.png>; rel=icon
x-ua-compatible: IE=Edge,chrome=1
This is another level of SPECIAL rule ...
This is another level of SPECIAL rule ...
It is, and you should be considering if it is worth the effort or we might end up in a rule management hell that's better addressed with other scripts/programs
It's actually not INVALID ... It actually redirects to the right domain ... At least that is what the Location header is saying.
True, my outcome should have been INACTIVE
. From the view of both maintaining a source + generating the output of those extensible huge hosts files would benefit from the removals of 302
+308
while either obtaining or keeping the LOCATION in there source's
This actually open a hole new situation, debate about how to handle redirects, We have touched the topic in the past, maybe it's time to make a new issue/talk on the subject.
The browser can't follow it for some obscure reason
That is because the SSL do not cover fourth level domains, so your are redirected to an insecure zone, where the browser are stopping the site handling with a warning.
This one is odd ... Who does that ?
Let me take a very fresh examlpe...
I duplicated the previous list twice, once adding www. subdomains, and once adding cdn.; resulting in two new lists of the formats: www.websitename.abc and cdn.websitename.abc. Source: https://github.com/StevenBlack/hosts/issues/1671#issuecomment-970165062 (§2)
Just found this from another import....
And compared to the test result, weeeel the numbers just don't add up
Note to self:
The idea is not bad. We should implement this. But subjects should be switched as INACTIVE
not INVALID
.
Side notes on the implementation - itself:
m.example.com -> example.com
| Outcome: m.example.com
as INACTIVE
.m.example.com -> example.org
| Outcome: NO Status switch.m.example.com -> a.example.com -> example.com
| Outcome: m.example.com
as INACTIVE
. This should only apply if the status code is in one of the 3XY
.
Side notes on the implementation - itself - when URLs are tested:
m.example.com/hello/world -> example.com/hello/world
| Outcome: m.example.com/hello/world
as INACTIVE
.m.example.com/hello/world -> example.com/world/hello
| Outcome: NO Status switch.m.example.com/hello/world -> example.org/hello/world
| Outcome: NO Status switch.m.example.com/hello/world -> a.example.com/world/hello -> example.com/hello/world
| Outcome: m.example.com/hello/world
as INACTIVE
. This is a special rule, but should be a global one as it is following the requests to the final destination, all "middlemen" is marked as potential
dead
https://github.com/funilrys/PyFunceble/issues/185#issuecomment-877784789
^(www|m)\..*\.tumblr\.com$
We will remove any useless (m.|www.|www.).domain.ccTLD
and only leave potential ACTIVE records in our ACTIVE/list
You can call this --complements
on steroids as it removes any middlemen from the finished result ACTIVE/list
IF domain-level >= 4 then rule is INVALID fi
The question might then become, is this a module we would like to be able to make special rules based on domain level?
The rewrite for this would be:
IF the domain is in file some internal db file of domains
then we do know; that any records with ^(www|m)\..*\.domain\.ccTLD$
are INVALID, we strip the prefixes and test those records that is left.
Example of such regex compliant file could be
tumblr.com = ^(www|m)\..*\. | !^([0-9a-z]{0-255}[.])?
bit.ly = !^bit.ly
https://github.com/funilrys/PyFunceble/issues/185#issuecomment-939477917
It's actually not INVALID ... It actually redirects to the right domain ... At least that is what the Location header is saying.
Yes, but you have no use of the record in any output list as it is redirecting, you would need the destination as it would help keeping the final lists as small and accurate as possible.
https://github.com/funilrys/PyFunceble/issues/185#issuecomment-1290400728
The idea is not bad. We should implement this. But subjects should be switched as INACTIVE not INVALID.
That would depend on the domain.... for bit.ly
and tumblr.com
INVALID is the correct results while other redirecting devils might be, by default, INACTIIVE
IF the domain is in file some internal db file of domains then we do know; that any records with ^(www|m)..*.domain.ccTLD$ are INVALID, we strip the prefixes and test those records that is left.
That is actually another improvement for the mining mechanism...
Here we are only talking about subjects that redirect to their 2ndLD. Example m.example.org -> example.org
and www.example.org -> example.org
. And this SPECIAL ruler will only be triggered if the given subject starts with www.
or m.
.
All URL shorteners are never triggered by this feature because the tested domain won't match the expected domain.
For example:
bit.ly/xyz -> example.org/hello/world
--> Never trigger.m.bit.ly/hello -> bit.ly/hello -> example.org/hello
--> Nothing change.www.bit.ly/hello -> bit.ly/hello -> example.com/hello
--> Nothing change.www.bit.ly/ -> bit.ly/
--> Trigger SPECIAL rule. www.bit.ly
will be dropped as INACTIVE
.Also note: The path will be compared. If it doesn't match, nothing changes.
There is a drawback with flaging a subject as INVALID
... A lot of users just drop and definitely delete INVALID
and leave PyFunceble to retest all INACTIVE
... That's also something we have to keep in mind ...
We are only a few people in the issues section but we are a lot more users than we think 😰...
NOTE:
Stumbled on this special domain case
skyblog.com
is invalid:thought_balloon: :thinking: maybe a new result list? that could also help on your comment in https://github.com/funilrys/PyFunceble/issues/185#issuecomment-1290906472 about the INVALID as they defacto are invalid cases and should be attended by list owner?
UPDATE: About the special rule for tumblr, then they have made a change for which I have NOT investigated, ONLY observed
teen-make-selfies.tumblr.com
thesweetelite.tumblr.com
This url is empty and redirects to the default homepage, I have found about 30 of these today and they was marked active against expectation.
Any change you (@funilrys ) could spend a few minutes on this?
@spirillen they actually don't redirect to the home page per-say. It's all javascript. Therefore, the rule should be about the 404 status code.
Is your feature request related to a problem? Please describe. There are no such thing as
^(www|m)\..*\.tumblr\.com$
Describe the solution you'd like Wee should append a
302
ruleDescribe alternatives you've considered Even better would be
Additional context Add any other context or screenshots about the feature request here.