Athlon1600 / php-proxy

A web proxy script written in PHP and built as an alternative to Glype.
https://www.php-proxy.com
MIT License
298 stars 158 forks source link

Support url_pattern_regex (one plugin for more sites) #54

Closed webaddicto closed 7 years ago

webaddicto commented 7 years ago

Useful to create a plugin that handles more websites (i.e abc.com, abc.de, abc.eu, etc) or similar.

webaddicto commented 7 years ago

Example usage to match abc.com, abc.de and abc.pl:

namespace Proxy\Plugin;

use Proxy\Plugin\AbstractPlugin;
use Proxy\Event\ProxyEvent;

use Proxy\Html;

class MultiSiteMatchPlugin extends AbstractPlugin {

    protected $url_pattern_regex = '#^abc\.(com|de|pl)$#is';

    public function onCompleted(ProxyEvent $event){

        $response = $event['response'];
        $html = $response->getContent();
        // do your stuff here...
        $response->setContent($html);
    }
}
Athlon1600 commented 7 years ago

What happens if both $url_pattern and $url_pattern_regex are set? I'd rather just have one $url_pattern that can be used to match both as regex and as non-regex. But first you would have to somehow detect whether $url_pattern contains regex or not and that can be tricky...

webaddicto commented 7 years ago

I would write in the help file or readme that user should use $url_pattern or $url_pattern_regex (not both).

An alternative option would be like this:

$url_pattern = "abc.com"; => match string
$url_pattern = "regex:'#^abc\.(com|de|pl)$#is'"; => extract regex:(.+?) and use preg_match()

What do you think?

Athlon1600 commented 7 years ago

Why not just have it like this:
$url_pattern = 'abc.com' => treat it like a regular strpos match
$url_pattern = '/abc.(com|net)/' => treat it like a preg_match match

The laziest way of accomplishing this is just to check the first character of $url_pattern. If it's /, then you have a regex pattern.

webaddicto commented 7 years ago

I wrote 4 alternatives:

Alternative 1:

        // url filter provided and current request url does not match it
        if($this->url_pattern && strpos($url, $this->url_pattern) === false){
            return;
        }
        // url filter (regex) provided and current request url does not match it
        elseif($this->url_pattern_regex && !preg_match($this->url_pattern_regex, $url)){
            return;
        }

Alternative 2:

        // url filter provided and current request url does not match it
        if($this->url_pattern){
            if(stripos($this->url_pattern, 'r:') === 0){
                if(!preg_match(substr($this->url_pattern, 2), $url)) 
                return;
            } 
            else
            {
                if(strpos($url, $this->url_pattern) === false) 
                return;
            }
        }

Alternative 3:

        // url filter provided and current request url does not match it
        if($this->url_pattern){
            if(!preg_match('/^[a-zA-Z0-9]{1}/is', $this->url_pattern){
                if(!preg_match($this->url_pattern, $url)) 
                return;
            } 
            else
            {
                if(strpos($url, $this->url_pattern) === false) 
                return;
            }
        }

Alternative 4:

        // url filter provided and current request url does not match it
        if($this->url_pattern){
            if(strpos($this->url_pattern, '/') === 0){
                if(!preg_match($this->url_pattern, $url)) 
                return;
            } 
            else
            {
                if(strpos($url, $this->url_pattern) === false) 
                return;
            }
        }

I would vote for 3 because it allows user to use any special character on preg_match:

May not be good for unicode domain names?

$url_pattern = '#^abc\.(com|de|pl)$#is'; => regex
$url_pattern = '/^abc\.(com|de|pl)$/is'; => regex
$url_pattern = '@^abc\.(com|de|pl)$@is'; => regex
$url_pattern = 'abc.com'; => string

Or 4 is fine too, but we need to write that user must use / character.

What do you think?

Athlon1600 commented 7 years ago

Option 3 would also match 'abc.com' even when it was intended to be a regular match... I would go with 4 because few people use delimiters other than '/'. It's a default regex deliminator on every tutorial online.

webaddicto commented 7 years ago

Perfect, I have updated the PR with a new commit according to your request :) https://github.com/Athlon1600/php-proxy/pull/54/commits/0595954f7858f5dcdc745fbc8549bfb4d2d4bc17

Athlon1600 commented 7 years ago

Yup, looks good to me now!

webaddicto commented 7 years ago

Good, I have added info about the $url_pattern on this new PR: https://github.com/Athlon1600/php-proxy/pull/55

Also this PR may help to detect the latest php-proxy version in case users report issues https://github.com/Athlon1600/php-proxy/pull/53