edgi-govdata-archiving / web-monitoring-ui

UI to enable analysts to quickly assess changes to monitored government websites
GNU General Public License v3.0
37 stars 38 forks source link

Render word docs using Google or Microsoft embedded renderers #186

Open Mr0grog opened 6 years ago

Mr0grog commented 6 years ago

This is kind of related to #179: like PDF and other non-text file formats, we can’t diff MS Word documents. BUT! Both Google and Microsoft offer iframe-embeddable renderers for Word docs, so we could use that to display the contents of the file, even if we can’t diff it.

Google: https://docs.google.com/gview?url=https://edgi-versionista-archive.s3.amazonaws.com/versionista2/74286-6216580/version-14182260.doc&embedded=true

https://docs.google.com/gview?url={URL here}&embedded=true

Microsoft: https://view.officeapps.live.com/op/embed.aspx?src=https://edgi-versionista-archive.s3.amazonaws.com/versionista2/74286-6216580/version-14182260.doc

https://view.officeapps.live.com/op/embed.aspx?src={URL here}

We should see if these viewers work for Powerpoint and Excel files, too.

And of course we should also see if we can figure out a way to actually diff them, but this is an easy short term solution that’s better than displaying nothing at all.

Mr0grog commented 6 years ago

See also this Stack Overflow thread: https://stackoverflow.com/questions/27957766/how-do-i-render-a-word-document-doc-docx-in-the-browser-using-javascript

Mr0grog commented 6 years ago

For this, you’ll probably want to create a new view that renders a word document using one of the above methods. See SandboxedHtml for an example, although this view will hopefully be much simpler.

Then modify RawVersion.render() and SideBySideRawVersions.renderVersion() to use that view based on the media type of the version you are rendering.

Check out ChangeView. mediaTypeForVersion() to see how to determine the media type for a version object. (In the future, we hope have an actual media type field on version objects, but that’s not done yet — see edgi-govdata-archiving/web-monitoring-db#199)