edgi-govdata-archiving / web-monitoring

Documentation and project-wide issues for the Website Monitoring project (a.k.a. "Scanner")
Creative Commons Attribution Share Alike 4.0 International
105 stars 17 forks source link

Solve rendering problem when iframe loads content from https or touches storage/cookies #92

Open weatherpattern opened 6 years ago

weatherpattern commented 6 years ago

Scanner has rendering problems because we render diffs by getting an HTML string back from the differ and shoving it into an iframe. Unfortunately, that means the frame runs with extremely restricted security, so if it tries to load scripts or styles or images from https, it will fail, and if it tries to touch storage or cookies, it will throw an exception. A lot of the sites aren’t well coded to account for that, so the JavaScript code in them starts running but never finishes.

Solving this is kinda complicated and I’m not sure whether it is best done in the UI or DB layer

See Menu on Jan 18 scrape: https://monitoring.envirodatagov.org/page/f4409490-e8f0-455d-ad8d-7ae0cf6aab74/0ad4bcbc-d751-44b0-bc41-434755416868..f690d39f-7f71-4fc7-93a4-89e7f01dfce8

Also: https://monitoring.envirodatagov.org/page/dbd5f818-44a6-486c-a3ec-9b49cb6b72d0/9b7a5215-e2b9-42d1-857c-a2ef874c4f5b..d4cb9d0f-3042-4bfb-adde-e335f75a1ab0

Related to: https://github.com/edgi-govdata-archiving/web-monitoring-db/issues/91

Mr0grog commented 6 years ago

To be more clear on this one, we need a proxy service to load the pages and diffs from (and that manipulates the HTML as needed along the way). This way, we load the pages from that service instead of stuffing client-side manipulated source code directly into the iframe—so the iframe actually has an origin and is allowed to do more things.

It’s not clear to me whether that proxy should be part of the UI’s existing server layer (because it is addressing concerns specific to the UI) or if it should live in the DB project (because it could probably be done in a relatively display-agnostic way and other potential UIs might benefit from it, or even just because we don’t want to put quite so much server logic into the UI project).

This also has other implications in the UI for how we handle getting metadata about a diff and potentially in processing for how we output these diffs and how we pack that metadata into different kinds of outputs (there is only one kind of output right now).

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in seven days if no further activity occurs. If it should not be closed, please comment! Thank you for your contributions.

Mr0grog commented 5 years ago

This is still definitely a major failing of our tools.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in seven days if no further activity occurs. If it should not be closed, please comment! Thank you for your contributions.