SpaceApi / validator

A micro-service to validate SpaceAPI endpoints.
Apache License 2.0
2 stars 2 forks source link

Validator returns HTTP 500 for certain URLs (missing compression support?) #63

Closed dbrgn closed 5 months ago

dbrgn commented 1 year ago

The validator returns a HTTP 500 error for the URL https://api.chaos-darmstadt.de:

danilo@c3po:~$ http POST https://validator.spaceapi.io/v2/validateURL url=https://api.chaos-darmstadt.de
HTTP/1.1 500 Internal Server Error
Content-Length: 53
Content-Type: text/plain; charset=utf-8
Date: Sat, 07 Jan 2023 23:50:04 GMT
Vary: Origin
X-Content-Type-Options: nosniff

invalid character 'x' looking for beginning of value

https://validator.spaceapi.io/ui/?url=https://api.chaos-darmstadt.de

@gidsi any idea?

(This issue popped up at https://github.com/SpaceApi/directory/pull/219)

dbrgn commented 1 year ago

As @mweinelt wrote at https://github.com/SpaceApi/directory/pull/219#issuecomment-1374658664:

The compressed response starts with an x, so I wonder whether the validator handles compression correctly.

fleaz commented 1 year ago

Hey,

I just dug a bit into this and created http://api.chaos-darmstadt.de/validator (plain http without redirect) so I could check the request with tcpdump and indeed, the validator is sending Accept-Encoding: gzip when making a request, therefore our backend correctly serves you the gzip'ed content:

13:24:39.054444 eth0  In  IP (tos 0x0, ttl 49, id 38744, offset 0, flags [DF], proto TCP (6), length 203)
    116.203.215.224.34948 > 130.83.177.129.80: Flags [P.], cksum 0xc03d (correct), seq 1:152, ack 1, win 502, options [nop,nop,TS val 559043985 ecr 2246131515], length 151: HTTP, length: 151
    GET /validator HTTP/1.1
    Host: api.chaos-darmstadt.de
    User-Agent: Go-http-client/1.1
    Origin: https://validator.spaceapi.io
    Accept-Encoding: gzip
psy commented 5 months ago

Any updates here?

The validator still doesn't accept compressed data:

$ curl --data-binary '{"url":"https://api.chaos-darmstadt.de/"}' -XPOST https://validator.spaceapi.io/v2/validateURL --output -
Unmarshal failed: error: invalid character 'x' looking for beginning of value, data: x���:hW'��DA H�ƚc[R>
                                                                                                         3!'�n���ߗlYy�gYKd��nt`<݂�@A�s�f���@ u�M�n:`��  @�Lc4P8�=�.߁«
X���%}�/s>�(��<KӴAX�i;�dԶ�м�VUkh9ɲ
                                  k'
�BuR
?F}�gH�Ѭȳ��f�D
       %
8� ID�9SB7�$v�T��ۀmkް�v;^ho�ez%di]�"#A���x
                           �4�d-d�^W)N>>���,iYM @>�[�c�JA��;\"hN6�<� gųO8|�+\?��#�0z2شf-�7�{~�Z27@��:�
                                                                                                      ��T&'skq$ɣ`BOsB�)Z^逻�-�

Saving the output without the added error message to a file called data and running gunzip against it shows that the data indeed is the gzipped api file:

$ printf "\x1f\x8b\x08\x00\x00\x00\x00\x00" | cat - /tmp/data | gzip -dc
{"api_compatibility":["14"],"contact":{"email":"info@chaos-darmstadt.de","irc":"ircs://irc.hackint.org/chaos-darmstadt","mastodon":"@cccda:chaos.social","matrix":"#cccda:lossy.network","ml":"public@lists.darmstadt.ccc.de","phone":"+4961515200088","twitter":"@chaosdarmstadt"},"location":{"lat":49.870889,"lon":8.651222,"address":"Wilhelminenstraße 17, 64283 Darmstadt, Germany","timezone":"Europe/Berlin"},"logo":"https://www.chaos-darmstadt.de/logo.svg","space":"CCC Darmstadt","state":{"open":true,"lastchange":1712439840.966214},"url":"https://www.chaos-darmstadt.de","feeds":{"blog":{"type":"rss","url":"https://www.chaos-darmstadt.de/feed.xml"},"calendar":{"type":"ical","url":"https://davical.darmstadt.ccc.de/public.php/cda/public/"}},"sensors":{"co2":[{"name":"6101bb CO2","value":580.0,"location":"Lounge","unit":"ppm"},{"name":"76b078 CO2","value":518.0,"location":"Kitchen","unit":"ppm"},{"name":"c02a8b CO2","value":617.0,"location":"Workshop","unit":"ppm"}],"humidity":[{"name":"6101bb Humidity","value":51.26,"location":"Lounge","unit":"%"},{"name":"76b078 Humidity","value":48.9,"location":"Kitchen","unit":"%"},{"name":"c02a8b Humidity","value":49.35,"location":"Workshop","unit":"%"}],"power_consumption":[{"name":"Power","value":811.9,"location":"CCC Darmstadt","unit":"W"}],"temperature":[{"name":"6101bb Temperature","value":19.89,"location":"Lounge","unit":"°C"},{"name":"76b078 Temperature","value":21.11,"location":"Kitchen","unit":"°C"},{"name":"c02a8b Temperature","value":21.47,"location":"Workshop","unit":"°C"}]}}
gidsi commented 5 months ago

Any updates here?

The validator still doesn't accept compressed data:

$ curl --data-binary '{"url":"https://api.chaos-darmstadt.de/"}' -XPOST https://validator.spaceapi.io/v2/validateURL --output -
Unmarshal failed: error: invalid character 'x' looking for beginning of value, data: x���:hW'��DA H�ƚc[R>
                                                                                                         3!'�n���ߗlYy�gYKd��nt`<݂�@A�s�f���@ u�M�n:`��    @�Lc4P8�=�.߁«
X���%}�/s>�(��<KӴAX�i;�dԶ�м�VUkh9ɲ
                                  k'
�BuR
?F}�gH�Ѭȳ��f�D
       %
8� ID�9SB7�$v�T��ۀmkް�v;^ho�ez%di]�"#A���x
                           �4�d-d�^W)N>>���,iYM @>�[�c�JA��;\"hN6�<� gųO8|�+\?��#�0z2شf-�7�{~�Z27@��:�
                                                                                                      ��T&'skq$ɣ`BOsB�)Z^逻�-�

Saving the output without the added error message to a file called data and running gunzip against it shows that the data indeed is the gzipped api file:

$ printf "\x1f\x8b\x08\x00\x00\x00\x00\x00" | cat - /tmp/data | gzip -dc
{"api_compatibility":["14"],"contact":{"email":"info@chaos-darmstadt.de","irc":"ircs://irc.hackint.org/chaos-darmstadt","mastodon":"@cccda:chaos.social","matrix":"#cccda:lossy.network","ml":"public@lists.darmstadt.ccc.de","phone":"+4961515200088","twitter":"@chaosdarmstadt"},"location":{"lat":49.870889,"lon":8.651222,"address":"Wilhelminenstraße 17, 64283 Darmstadt, Germany","timezone":"Europe/Berlin"},"logo":"https://www.chaos-darmstadt.de/logo.svg","space":"CCC Darmstadt","state":{"open":true,"lastchange":1712439840.966214},"url":"https://www.chaos-darmstadt.de","feeds":{"blog":{"type":"rss","url":"https://www.chaos-darmstadt.de/feed.xml"},"calendar":{"type":"ical","url":"https://davical.darmstadt.ccc.de/public.php/cda/public/"}},"sensors":{"co2":[{"name":"6101bb CO2","value":580.0,"location":"Lounge","unit":"ppm"},{"name":"76b078 CO2","value":518.0,"location":"Kitchen","unit":"ppm"},{"name":"c02a8b CO2","value":617.0,"location":"Workshop","unit":"ppm"}],"humidity":[{"name":"6101bb Humidity","value":51.26,"location":"Lounge","unit":"%"},{"name":"76b078 Humidity","value":48.9,"location":"Kitchen","unit":"%"},{"name":"c02a8b Humidity","value":49.35,"location":"Workshop","unit":"%"}],"power_consumption":[{"name":"Power","value":811.9,"location":"CCC Darmstadt","unit":"W"}],"temperature":[{"name":"6101bb Temperature","value":19.89,"location":"Lounge","unit":"°C"},{"name":"76b078 Temperature","value":21.11,"location":"Kitchen","unit":"°C"},{"name":"c02a8b Temperature","value":21.47,"location":"Workshop","unit":"°C"}]}}

Hey @psy

i double checked the issue, it looks like there is something wrong with the compression headers, it looks like we are sending Accept-Encoding: gzip but the server answers with Content-Encoding: deflate (haven't looked in the data itself though).

Same happens with curl:

curl https://api.chaos-darmstadt.de/ -v -H 'Accept-Encoding: gzip'
...
> Accept-Encoding: gzip
...
< content-encoding: deflate
mweinelt commented 5 months ago

That's odd. For the same request I see the following:

> Accept-Encoding: gzip
[...]
< content-encoding: gzip

I even see the same behavior on the backend service.

Tested on curl 7.88.1 (Debian 12) and 8.6.0 (NixOS unstable)

gidsi commented 5 months ago

That's odd. For the same request I see the following:

> Accept-Encoding: gzip
[...]
< content-encoding: gzip

I even see the same behavior on the backend service.

Tested on curl 7.88.1 (Debian 12) and 8.6.0 (NixOS unstable)

Yeah, and while it does everything works fine. But it's flakey and also sends the other one. I don't think curl and the validator have the same issue though, so I'm convinced it's on the server side.

gidsi commented 5 months ago

That's odd. For the same request I see the following:

> Accept-Encoding: gzip
[...]
< content-encoding: gzip

I even see the same behavior on the backend service.

Tested on curl 7.88.1 (Debian 12) and 8.6.0 (NixOS unstable)

here is the full output in case you need it

curl https://api.chaos-darmstadt.de/ -v -H 'Accept-Encoding: gzip' --output -
* Host api.chaos-darmstadt.de:443 was resolved.
* IPv6: 2001:41b8:83f:4242::b181
* IPv4: 130.83.177.129
*   Trying [2001:41b8:83f:4242::b181]:443...
* Connected to api.chaos-darmstadt.de (2001:41b8:83f:4242::b181) port 443
* ALPN: curl offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: none
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384 / x25519 / id-ecPublicKey
* ALPN: server accepted h2
* Server certificate:
*  subject: CN=www1.darmstadt.ccc.de
*  start date: Feb 11 22:01:58 2024 GMT
*  expire date: May 11 22:01:57 2024 GMT
*  subjectAltName: host "api.chaos-darmstadt.de" matched cert's "api.chaos-darmstadt.de"
*  issuer: C=US; O=Let's Encrypt; CN=R3
*  SSL certificate verify ok.
*   Certificate level 0: Public key type EC/prime256v1 (256/128 Bits/secBits), signed using sha256WithRSAEncryption
*   Certificate level 1: Public key type RSA (2048/112 Bits/secBits), signed using sha256WithRSAEncryption
*   Certificate level 2: Public key type RSA (4096/152 Bits/secBits), signed using sha256WithRSAEncryption
* using HTTP/2
* [HTTP/2] [1] OPENED stream for https://api.chaos-darmstadt.de/
* [HTTP/2] [1] [:method: GET]
* [HTTP/2] [1] [:scheme: https]
* [HTTP/2] [1] [:authority: api.chaos-darmstadt.de]
* [HTTP/2] [1] [:path: /]
* [HTTP/2] [1] [user-agent: curl/8.7.1]
* [HTTP/2] [1] [accept: */*]
* [HTTP/2] [1] [accept-encoding: gzip]
> GET / HTTP/2
> Host: api.chaos-darmstadt.de
> User-Agent: curl/8.7.1
> Accept: */*
> Accept-Encoding: gzip
>
* Request completely sent off
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
< HTTP/2 200
< server: nginx/1.22.1
< date: Sun, 07 Apr 2024 14:30:07 GMT
< content-type: application/json
< content-length: 669
< referrer-policy: no-referrer
< x-content-type-options: nosniff
< x-frame-options: SAMEORIGIN
< content-encoding: deflate
< strict-transport-security: max-age=63072000; includeSubDomains; preload
< x-content-type-options: nosniff
< x-xss-protection: 1; mode=block
< referrer-policy: no-referrer-when-downgrade
< x-frame-options: DENY
< access-control-allow-origin: *
<
x��_:hW'N6�DA H�ƚc[T}^gCNuͲ(>'�?'�rMcYKd�2+�x����ws
@:�$�\3~#uƭ��V,TV5Ran�0��^1ڵaodN�72�am�G��J�QsW�(S<KӴ(AX����2gۂb��-2��OseVUNx.j��#2C�+ƣx4$\�&�?��+�#
                                                                                                    PC1zƘ��W[�i>�y`AD#�
p�Aw9SBW�"I��b��Nۀmm^�/7
                        kĔdͿe�Vd%N�|5mL
�{�Y�p                                 AegLgERL{/2��)<��%'=ʅq76BPz���8'�z<8��
      noGj�|B�!�±�1�գB
* Connection #0 to host api.chaos-darmstadt.de left intact
 D9
   #8?=IG;��3��@܅�o/��  %
psy commented 5 months ago

Interesting, I ran curl against the api from different machines for hours and never received an answer with content-type: deflate. I stopped the requests as I wanted to note that down, started noting down and reran a single requests to be 100% sure and suddenly, I received a deflate answer. From that point on, I only received answers with deflate for a short amount of time, until finally receiving gzip responses again. Very strange behavior, none of us can explain or reliably reproduce.

As a result we completely deactivated gzip. Looks good.