Open FrankWarius opened 1 year ago
This isn't a problem on the demo server. The URL is valid UTF-8 and is recognised OK.
https://dev.webtrees.net/demo-dev/tree/demo/branches/lammh%C3%B6fer
My guess is that the validation error is occurring on one of the HTTP request headers.
Control panel -> Server information -> PHP Variables.
Are there any "interesting" $_SERVER
variables? Perhaps your server is adding geo-lookup headers, and using invalid characters here?
I don't think that there are added headers, it's nativ IIS10
Variable | Value -- | -- $_COOKIE['__Secure-WT-ID'] | 2e24ba5eb497d1bf0ec0132bacf8f5c5 $_SERVER['_FCGI_X_PIPE_'] | \\.\pipe\IISFCGI-1e736672-8688-4dea-8879-a9feb4557a83 $_SERVER['PHPRC'] | C:\PHPEnv\PHPini\ $_SERVER['PHP_FCGI_MAX_REQUESTS'] | 10000 $_SERVER['ALLUSERSPROFILE'] | C:\ProgramData $_SERVER['APPDATA'] | C:\Windows\system32\config\systemprofile\AppData\Roaming $_SERVER['APP_POOL_CONFIG'] | C:\inetpub\temp\apppools\WTProd\WTProd.config $_SERVER['APP_POOL_ID'] | WTProd $_SERVER['CommonProgramFiles'] | C:\Program Files\Common Files $_SERVER['CommonProgramFiles(x86)'] | C:\Program Files (x86)\Common Files $_SERVER['CommonProgramW6432'] | C:\Program Files\Common Files $_SERVER['COMPUTERNAME'] | SRV23-5DP-DE $_SERVER['ComSpec'] | $_SERVER['DriverData'] | $_SERVER['LOCALAPPDATA'] | $_SERVER['NUMBER_OF_PROCESSORS'] | 4 $_SERVER['OS'] | Windows_NT $_SERVER['Path'] | $_SERVER['PATHEXT'] | .COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC $_SERVER['PROCESSOR_ARCHITECTURE'] | AMD64 $_SERVER['PROCESSOR_IDENTIFIER'] | Intel64 Family 6 Model 85 Stepping 4, GenuineIntel $_SERVER['PROCESSOR_LEVEL'] | 6 $_SERVER['PROCESSOR_REVISION'] | 5504 $_SERVER['ProgramData'] | C:\ProgramData $_SERVER['ProgramFiles'] | C:\Program Files $_SERVER['ProgramFiles(x86)'] | C:\Program Files (x86) $_SERVER['ProgramW6432'] | C:\Program Files $_SERVER['PSModulePath'] | $_SERVER['PUBLIC'] | $_SERVER['SystemDrive'] | C: $_SERVER['SystemRoot'] | C:\Windows $_SERVER['TEMP'] | C:\Windows\TEMP $_SERVER['TMP'] | C:\Windows\TEMP $_SERVER['USERDOMAIN'] | WORKGROUP $_SERVER['USERNAME'] | SRV23-5DP-DE$ $_SERVER['USERPROFILE'] | C:\Windows\system32\config\systemprofile $_SERVER['windir'] | C:\Windows $_SERVER['ORIG_PATH_INFO'] | /index.php $_SERVER['URL'] | /index.php $_SERVER['SERVER_SOFTWARE'] | Microsoft-IIS/10.0 $_SERVER['SERVER_PROTOCOL'] | HTTP/1.1 $_SERVER['SERVER_PORT_SECURE'] | 1 $_SERVER['SERVER_PORT'] | 443 $_SERVER['SERVER_NAME'] | wbt.warius.info $_SERVER['SCRIPT_NAME'] | /index.php $_SERVER['SCRIPT_FILENAME'] | D:\web\WT21Git\webtrees\index.php $_SERVER['REQUEST_URI'] | /admin/information $_SERVER['REQUEST_METHOD'] | GET $_SERVER['REMOTE_USER'] | no value $_SERVER['REMOTE_PORT'] | 62907 $_SERVER['REMOTE_HOST'] | $_SERVER['REMOTE_ADDR'] | $_SERVER['QUERY_STRING'] | no value $_SERVER['PATH_TRANSLATED'] | D:\web\WT21Git\webtrees\index.php $_SERVER['LOGON_USER'] | no value $_SERVER['LOCAL_ADDR'] | 85.215.178.206 $_SERVER['INSTANCE_META_PATH'] | /LM/W3SVC/1 $_SERVER['INSTANCE_NAME'] | WTPROD $_SERVER['INSTANCE_ID'] | 1 $_SERVER['HTTPS_SERVER_SUBJECT'] | CN=wbt.warius.info $_SERVER['HTTPS_SERVER_ISSUER'] | C=US, O=Let's Encrypt, CN=R3 $_SERVER['HTTPS_SECRETKEYSIZE'] | 2048 $_SERVER['HTTPS_KEYSIZE'] | 256 $_SERVER['HTTPS'] | on $_SERVER['GATEWAY_INTERFACE'] | CGI/1.1 $_SERVER['DOCUMENT_ROOT'] | D:\web\WT21Git\webtrees $_SERVER['CONTENT_TYPE'] | no value $_SERVER['CONTENT_LENGTH'] | 0 $_SERVER['CERT_SUBJECT'] | no value $_SERVER['CERT_SERIALNUMBER'] | no value $_SERVER['CERT_ISSUER'] | no value $_SERVER['CERT_FLAGS'] | no value $_SERVER['CERT_COOKIE'] | no value $_SERVER['AUTH_USER'] | no value $_SERVER['AUTH_PASSWORD'] | no value $_SERVER['AUTH_TYPE'] | no value $_SERVER['APPL_PHYSICAL_PATH'] | D:\web\WT21Git\webtrees\ $_SERVER['APPL_MD_PATH'] | /LM/W3SVC/1/ROOT $_SERVER['IIS_UrlRewriteModule'] | 7,1,1993,2351 $_SERVER['UNENCODED_URL'] | /admin/information $_SERVER['IIS_WasUrlRewritten'] | 1 $_SERVER['HTTP_X_ORIGINAL_URL'] | /admin/information $_SERVER['HTTP_SEC_FETCH_USER'] | ?1 $_SERVER['HTTP_SEC_FETCH_SITE'] | same-origin $_SERVER['HTTP_SEC_FETCH_MODE'] | navigate $_SERVER['HTTP_SEC_FETCH_DEST'] | document $_SERVER['HTTP_UPGRADE_INSECURE_REQUESTS'] | 1 $_SERVER['HTTP_DNT'] | 1 $_SERVER['HTTP_USER_AGENT'] | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:108.0) Gecko/20100101 Firefox/108.0 $_SERVER['HTTP_TE'] | trailers $_SERVER['HTTP_REFERER'] | https://wbt.warius.info/admin $_SERVER['HTTP_HOST'] | wbt.warius.info $_SERVER['HTTP_COOKIE'] | __Secure-WT-ID=2e24ba5eb497d1bf0ec0132bacf8f5c5 $_SERVER['HTTP_ACCEPT_LANGUAGE'] | de,en-US;q=0.7,en;q=0.3 $_SERVER['HTTP_ACCEPT_ENCODING'] | gzip, deflate, br $_SERVER['HTTP_ACCEPT'] | text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8 $_SERVER['HTTP_CONTENT_LENGTH'] | 0 $_SERVER['HTTP_CONNECTION'] | close $_SERVER['FCGI_ROLE'] | RESPONDER $_SERVER['PHP_SELF'] | /index.php $_SERVER['REQUEST_TIME_FLOAT'] | 1673171317.277 $_SERVER['REQUEST_TIME'] | 1673171317Perhaps you could add some debug code here:
Write $key
and $value
to a log file. (If they contain invalid UTF characters, you probably cannot write them to the database).
I added in line 67 $x = preg_match('//u', $value, $match); throw new HttpBadRequestException('Invalid UTF-8 characters in request (' . $value . ')'); and use XDebug (on 2.1.15) $match: array(0) $value: "P�ch" 'P\xE4ch' $x: false
If this is CP1252, then \xE4
is ä
- Päch
Can you add both $value
and $key
to the debug?
$value: "P�ch" 'P\xE4ch' $key: "surname" url now: https://wbt.warius.info/tree/Warius/branches/P%C3%A4ch
It's pretty URL on IIS related http://dev.warius.info/index.php?route=%2Ftree%2Ftree1%2Fbranches%2FP%25C3%25A4ch&soundex_dm=0&soundex_std=0 works
Anforderungs-URL: https://wbt.warius.info/tree/Warius/branches/P%C3%A4ch Anforderungsmethode: GET Statuscode: 500 Remoteadresse: 85.215.178.206:443 Referrer-Richtlinie: strict-origin-when-cross-origin cache-control: no-store, no-cache, must-revalidate content-encoding: gzip content-length: 649 content-type: text/html; charset=UTF-8 date: Sun, 08 Jan 2023 15:51:58 GMT expires: Thu, 19 Nov 1981 08:52:00 GMT pragma: no-cache server: Microsoft-IIS/10.0 vary: Accept-Encoding x-powered-by: PHP/8.1.14 :authority: wbt.warius.info :method: GET :path: /tree/Warius/branches/P%C3%A4ch :scheme: https accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,/;q=0.8,application/signed-exchange;v=b3;q=0.9 accept-encoding: gzip, deflate, br accept-language: de,de-DE;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6 cache-control: no-cache cookie: __Secure-WT-ID=9f965da74fe2d009df90a681f0abb14e dnt: 1 pragma: no-cache sec-ch-ua: "Not?A_Brand";v="8", "Chromium";v="108", "Microsoft Edge";v="108" sec-ch-ua-mobile: ?0 sec-ch-ua-platform: "Windows" sec-fetch-dest: document sec-fetch-mode: navigate sec-fetch-site: none sec-fetch-user: ?1 upgrade-insecure-requests: 1 user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36 Edg/108.0.1462.76
It's an issue of IIS URL Rewrite module wich decode the REQUEST_URI when rewriting.
XDebug shows the following server variables: REQUEST_URI: "/tree/Warius/branches/P�ch" which has the wrong code page UNENCODED_URL: "/tree/Warius/branches/P%C3%A4ch" which should be used HTTP_X_ORIGINAL_URL: "/tree/Warius/branches/P%C3%A4ch" which is also correct
Webtrees should use UNENCODED_URL for IIS
I can also change the rewrite rule but I need some information
the actual rwrite action is
<action type="Rewrite" url="index.php" appendQueryString="true" />
I can add the unencoded_url to index.php but don't now how webtrees need it
<action type="Rewrite" url="index.php?{UNENCODED_URL}" appendQueryString="false" />
?
fixed by adding
<set name="REQUEST_URI" value="{UNENCODED_URL}" />
to the IIS10 URL Rewirte Rule serverVariables
complete rule:
<rule name="Webtrees Rewrite" enabled="true" stopProcessing="true">
<match url="^" ignoreCase="false" />
<conditions logicalGrouping="MatchAll" trackAllCaptures="false">
<add input="{REQUEST_FILENAME}" matchType="IsDirectory" negate="true" />
<add input="{REQUEST_FILENAME}" matchType="IsFile" negate="true" />
</conditions>
<action type="Rewrite" url="index.php" appendQueryString="true" logRewrittenUrl="false" />
<serverVariables>
<set name="REQUEST_URI" value="{UNENCODED_URL}" />
</serverVariables>
</rule>
should we update the documentation?
There are two parts to this issue.
1) webtrees detects this invalid character, and tries to give a 400
Bad Request
response.
Currently, we check that the headers contain valid UTF8. I think we should be more strict. The headers should be 7-bit ASCII
2) the error page generates a similar error - and this gives a 500
response.
This needs to be fixed, so that we can give the correct 400
response and error message.
2 Notes:
I have a lot 500 errors e.g. from bing using old links with umlaut https://wbt.warius.info/tree/Warius/branches/lammh%C3%B6fer can You please redirect to an 404 error?
Uncaught Fisharebest\Webtrees\Http\Exceptions\HttpBadRequestException: Invalid UTF-8 characters in request in D:\web\WT21Git\webtrees\app\Validator.php:67 Stack trace:
0 [internal function]: Fisharebest\Webtrees\Validator::Fisharebest\Webtrees{closure}('lammh\xF6fer', 'surname')
1 D:\web\WT21Git\webtrees\app\Validator.php(71): array_walk_recursive(Array, Object(Closure))
2 D:\web\WT21Git\webtrees\app\Validator.php(85): Fisharebest\Webtrees\Validator->__construct(Array, Object(Nyholm\Psr7\ServerRequest), 'UTF-8')
3 D:\web\WT21Git\webtrees\app\Http\Middleware\HandleExceptions.php(155): Fisharebest\Webtrees\Validator::attributes(Object(Nyholm\Psr7\ServerRequest))
4 D:\web\WT21Git\webtrees\app\Http\Middleware\HandleExceptions.php(99): Fisharebest\Webtrees\Http\Middleware\HandleExceptions->httpExceptionResponse(Object(Nyholm\Psr7\ServerRequest), Object(Fisharebest\Webtrees\Http\Exceptions\HttpBadRequestException))
5 D:\web\WT21Git\webtrees\vendor\oscarotero\middleland\src\Dispatcher.php(136): Fisharebest\Webtrees\Http\Middleware\HandleExceptions->process(Object(Nyholm\Psr7\ServerRequest), Object(Middleland\Dispatcher))