Closed mik-laj closed 2 months ago
@mik-laj Hi, what is your expected fix for this issue? Are you looking for something like this?
if dag is None:
response = jsonify(...)
response.status_code = 404
return response
Regarding tests, is a URL like rendered?dag_id=non_existent_dag_id
sufficient in tests/www/test_views.py
?
It should be something similar. The most important thing is that no mushrooms appear, but user-readable error messages.
For example: When you enter following address: http://localhost:28080/tries?dag_id=example_automl_text_sentiment2&days=30 http://localhost:28080/landing_times?dag_id=example_automl_text_sentiment2&days=30 http://localhost:28080/gantt?dag_id=example_automl_text_sentiment2 http://localhost:28080/dag_details?dag_id=example_automl_text_sentiment2 http://localhost:28080/code?dag_id=example_automl_text_sentiment2 you will see error screen similar:
However, if you go to the link: http://localhost:28080/tree?dag_id=example_automl_text_cls2 you will see following error message:
This should be standardized and a clear message should always be displayed to the user.
For clarity. This contribution does not have to solve all problems in one PR. I will be happy even if one problem is solved. And another person will be able to create more changes and solve more problems.
@mik-laj I have created #8279 which addresses a single endpoint for a start. Please let me know what you think.
This is a user experience problem, but it is also a security problem. If we see similar messages, it means that we haven't verified enough input data. Data validation is the basic method of protecting against other serious attacks from the "Injection" family e.g. SQL Injection. Input validation should happen as early as possible in the data flow, preferably as soon as the data is received from the client. However, we do not have any validation for many parameters. More information: https://cheatsheetseries.owasp.org/cheatsheets/Input_Validation_Cheat_Sheet.html
Hi @mik-laj As discussed let me put my hands on this. Let me know if there's something you would want to provide additional info.
@2796gaurav We already have one draft. Please read it and comments, and then you will be able to start working on this change.
Yes sure!
I also invite you to read our guides: https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst https://github.com/apache/airflow/blob/master/BREEZE.rst https://github.com/apache/airflow/blob/master/LOCAL_VIRTUALENV.rst https://github.com/apache/airflow/blob/master/STATIC_CODE_CHECKS.rst https://github.com/apache/airflow/blob/master/TESTING.rst There is a lot of information about our work environment and community
@2796gaurav I unassigned you from this ticket so that the next contributor can start working. If you want to continue working, let me know and I will assign you again.
I wonder if type checks can be used to catch these kinds of issues. There are a lot of current_app.dag_bag
, current_app.appbuilder
, etc., which are not covered by Mypy due to the dynamic nature of flask.current_app
(basically any attributes on it is Any
). Since we “know” what attributes to expect on the current_app
instance within Airflow, maybe we can introduce a typing shim around it? Something like
# airflow/www/app.py
if TYPE_CHECKING:
from airflow.models.dagbag import DagBag
from airflow.www.security import AirflowSecurityManager
class _AirflowAppBuilder(Protocol):
sm: AirflowSecurityManager
...
class _CurrentApp(Protocol):
dag_bag: DagBag
appbuilder: _AirflowAppBuilder
...
from flask import current_app as _current_app
current_app = cast("_CurrentApp", _current_app)
And then all code can import this instead of directly from Flask to be type checked.
We can alternatively supply a type stub flask.pyi
to “lie to” Mypy flask.current_app
is _CurrentApp
.
This issue seems to be resolved in the latest version. The following links, given in the original issue, return a reasonable error and valid page without going "nuclear":
http://localhost:28080/tries?dag_id=example_automl_text_sentiment2&days=30 http://localhost:28080/landing_times?dag_id=example_automl_text_sentiment2&days=30 http://localhost:28080/gantt?dag_id=example_automl_text_sentiment2 http://localhost:28080/dag_details?dag_id=example_automl_text_sentiment2 http://localhost:28080/code?dag_id=example_automl_text_sentiment2
Description
Hi. The webserver code is very optimistic and negative paths are not checked. For example: We have the following code: https://github.com/apache/airflow/blob/0dafdd0b9d635b4513b1413007337b19c3d96b17/airflow/www/views.py#L595-L597
It is not checked here whether the DAG object exists. Condition
dag == None
should be added and when it is met, error 404 should be reported.Use case / motivation
Improving the experience of using the webserver and reducing the number of nukulars.
I hope that a thorough review of the entire web server code and completing the tests with negative paths will improve the overall health of the webserver.
If this is done by the Polidea team, it will be an opportunity to get to know the webserver better. My team has not focused on the webserver yet.
it will be an opportunity to find other health problems (e.g. side-effect, missing tests).
Related Issues
N/A