edgi-govdata-archiving / web-monitoring-diff

Tools for diffing and comparing web content. Also includes a web server that makes diffs available as an HTTP service.
https://web-monitoring-diff.readthedocs.io/
GNU General Public License v3.0
10 stars 3 forks source link

HTML Diff: Consider Putting <ins>/<del> *Inside* Inline Elements #10

Open Mr0grog opened 6 years ago

Mr0grog commented 6 years ago

We currently go through a lot of effort to make our added/removed markup sit inside “block-level” tags and outside other, “inline” tags (see merge_changes() and merge_change_groups()). However, that leaves the possibility that an inline tag in a page could be styled in such a way that it obscures the change styling.

See the “You may need a PDF reader to view…” text in the middle of this diff, for example. The whole thing is marked up as removed/added with the surrounding content, but it is a <span> that is styled as block with a background:

https://monitoring.envirodatagov.org/page/3939ce3a-90ca-4b0f-812c-f95eee28d784/e748a732-e36d-477b-bc5d-8316571817bc..7210327d-06a1-46a6-b327-87d50e325ef4

screen shot 2018-05-17 at 9 10 19 am

We should experiment with simply putting the markup inside all tags, rather than inside some and outside others, to see if it works out well. I’m pretty sure I hadn’t done that initially because I was concerned about messing up the page’s layout, but don’t have any clear record of that experimentation. We can switch behaviors via an argument for now.

If it turns out not to work well, the real solution in the end is still edgi-govdata-archiving/web-monitoring-processing#101 and/or edgi-govdata-archiving/web-monitoring-ui#194.

Mr0grog commented 5 years ago

Very definitely still meaningful and relevant.