Encode “<” as \u003c in response

oshev commented 5 years ago

Passing "<" as it is in JSON output is potentially dangerous because this opens a possibility to sneak an executable jscript code there, which will be mindless executed by a browser calling prediction API. This can be used for Cross-site scripting (XSS) exploit.

Seldon uses Flask's jsonify library to wrap the response (see source code). Unfortunately, it doesn't encode “<” as \u003c, which is a common mechanism to protect from this problem. It can be tested this way:

from flask import jsonify, Flask
app = Flask(__name__, static_url_path='')
with app.app_context():
    resp = jsonify({"first": "<body onload=alert('test1')>"}
print(resp.data)
# this will output b'{"first":"<body onload=alert(\'test1\')>"}\n'

The output contains an executable JSON (if you copy this JSON to an html file, say a.html and try to open it locally; it'll create an alert in your browser).

The solution is to encode “<” as \u003c in response, either explicitely in Python or using more secure json-convertion libraries.

ukclivecox commented 5 years ago

We should evaluate the recommendations at https://flask.palletsprojects.com/en/1.1.x/security/

ukclivecox commented 5 years ago

However, the engine sits in front of all requests and it is that that returns the final response to the caller. Flask is not exposed directly.

ukclivecox commented 5 years ago

Also, @oshev are you saying that you expect to call Seldon exposed endpoints directly from a client browser? I would assume in production Seldon endpoints would have authentication via an ingress and be accessed from middleware.

oshev commented 5 years ago

We should evaluate the recommendations at https://flask.palletsprojects.com/en/1.1.x/security/

Yes, it would be great if you evaluate them all!

oshev commented 5 years ago

However, the engine sits in front of all requests and it is that that returns the final response to the caller. Flask is not exposed directly.

What library do you use in the engine? In couldn't easily check if it encodes “<” as \u003c in the response. To check this, I would need to create a custom Seldon model which returns "<" in the data part of the response.

If you could confirm, that the engine encodes “<” as \u003c in the response, we could close this issue.

oshev commented 5 years ago

Also, @oshev are you saying that you expect to call Seldon exposed endpoints directly from a client browser? I would assume in production Seldon endpoints would have authentication via an ingress and be accessed from middleware.

If by middleware you mean routing services like Ambassador or Istio, I don't they would modify the Seldon response. In my understanding, they would just pass "<" as it is to the client.

If by middleware you mean some other back-end service in between client and ML, it's possible there won't be one. Some models could be simple / unassuming enough to be called dirrectly from clients browsers. In fact, it's the use case we have. I'll double check, in case I'm missing something.

adriangonz commented 5 years ago

Fixed by #893

SeldonIO / seldon-core

Encode “<” as \u003c in response #837