dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
11.19k stars 1.41k forks source link

Dagstermill notebook viewer design and security concerns #14862

Open ei-grad opened 1 year ago

ei-grad commented 1 year ago

Dagster version

1.3.10

What's the issue?

Currently /dagit/notebook view accepts an arbitrary filesystem path and only checks that the filename ends with .ipynb. This doesn't look like a critical security issue, but still is not a good practice, and also, there was a reasonable discussion during #4515 review about possibility of notebook code evaluation and possible XSS-like vulnerabilities during notebook visualization.

Also, this design doesn't work well with the IOManager concept (you can't view notebooks which are managed by custom IOManager).

What did you expect to happen?

The /dagit/notebook handler shouldn't accept arbitrary paths. It should be used only for visualization of executed notebooks (which are the assets, and are identified by asset name and partition key) and source notebooks (which are specified in the asset source code). Hence it should accept the corresponding query parameters like asset, source (boolean, to get the source notebook) and partition_key instead of path parameter.

Also, probably the IOManager should be used to load notebook contents.

How to reproduce?

No response

Deployment type

None

Deployment details

No response

Additional information

Related source code: https://github.com/dagster-io/dagster/blob/master/python_modules/dagit/dagit/webserver.py#L124-L150 https://github.com/dagster-io/dagster/blob/master/python_modules/dagster/dagster/_grpc/impl.py#L566-L577

Related #1469

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

yuhan commented 1 year ago

Thanks for opening the issue. The general notion of making outputs more visible is on our roadmap, where the notebook viewer limitation may belong to that larger effort.

On the note of security concerns, I totally agree that it isn't a good practice. A potential path could be to leverage nbconvert's sanitize-html config option to sanitize the html.