Closed LeMoussel closed 7 years ago
@LeMoussel thanks for feedback!
ini_set('mbstring.substitute_character', "none");
is it really necessary?
I tried your code and almost all is great except this one: https://github.com/bopoda/robots-txt-parser/blob/master/tests/data/robots9/robots.txt https://github.com/bopoda/robots-txt-parser/blob/master/tests/data/robots9/expectedAllow
/allowpath
should be allowed.
http://joxi.ru/1A5QGRdTnkqVPA
Latest rules transformed to "/*
" (all strange characters trimmed), so it means all urls will be disallow in this case.
ini_set('mbstring.substitute_character', "none");
without this line everything is good. You can create PR if you want :)
I think it's necesary to strip invalid characters from UTF-8 strings to "none".
robots9/robots.txt
file is encoded in ANSI, not UTF-8. Therefore it's normal that latest rules transformed to "/*
" é,Ã,é,...
are invalid UTF-8 characters. #51
I do PR. It's fail with robots9/robots.txt
:)
1) CommonTest::testIsUrlAllow with data set #20 ('User-Agent: \nDisallow: /bd/\n...ow: /�\n', '/allowPath')
Failed asserting that false is true.
On a server, I have Alpine Linux installed. Alpine Linux is a security-oriented, lightweight Linux distribution based on musl libc and busybox.
There are functional differences between Debian and Alpine from glibc and among others with iconv. See:
http://wiki.musl-libc.org/wiki/Functional_differences_from_glibc
http://wiki.musl-libc.org/wiki/Functional_differences_from_glibc#iconv
In some cases, with Alpine Linux, in PHP, you may encounter problems with
iconv()
and the option// ignore
. That means you can't currently use this function to filter invalid characters. Instead it silently fails and returns an empty string (or you'll get a notice but only if you have E_NOTICE enabled). Réf: PHP Manual inconv()Like describe, in iconv() PHP Manual, in
RobotsTxtParser::__construct()
, Line 63can be replace by
So I propose this