RedHatEMEA / satools

18 stars 7 forks source link

Slide History View #49

Open tenfourty opened 11 years ago

tenfourty commented 11 years ago

Add the ability to view the history of a particular slide, what I mean by this is best illustrated with a short example. If I click on a slide in someone's deck in /home or /docspace I'd like the ability to see what other slides are out there that are exactly the same and their date order - so where is the oldest version of this slide found.

This feature is really useful also because for a given slide you can tell who else is using it in their decks and when it was used. Content authors might want to use this feature to see who else has reused their deck etc.

This feature will be particularly useful when there is a large community of users uploading and sharing their presentations through SA Tools.

tenfourty commented 11 years ago

Having discussed this with Jim there is already the concept of a unique hash within Juno for the slides (this is calculated off the generated png of the slide without the page number) so this is already a way to identify unique slides. This coupled with the ODP slide meta-data which contains the edit date would allow someone to develop this feature as the building blocks already exist within Juno.

jamesread commented 11 years ago

So, I've had a few thoughts on this. It seems to me that it would be useful to be able to calculate the levenshtein distance (aka: similarity) between two slides. Given this metric and the mtime on a slide, you could construct a crude but relatively effective "slide history", which could be used to determine things like the estimated slide origin, as well as the deck in which the slide was most recently used.

As a knock on effect, this metric could be used to make search results far more relevant - "collapsing" similar slides into a single search hit, rather than having the same slide from 5 different decks occupying positions 1 through 5 in the search results.

jim-minter commented 11 years ago

fyi, here's an excerpt from an e-mail I sent a co-conspirator on this topic, explaining how some of this currently works:

At indexing time (add_preso(), createthumbs() in juno/app/index.py), the thumbnails are rendered after attempting to remove the page number from the footer. I then store the SHA1 of the PNG file in the database. For general queries (Search::GET in juno/app/app.py), each matching hash is returned only once, so that slides which are exactly duplicate should be suppressed (although if OpenOffice decides to render two identical slides into slightly different PNG files, this doesn't work). However if you use an 'is:"/path/to/file.odp"' type query (i.e. "just show me a given ODP file"), duplicate suppression is then switched off. This means that a presentation containing 10 identical slides returns 10 slides when you view the presentation itself, but if you search for that particular slide, it should appear only once in the search results. The downside of this is that there's currently no effective way in the UI to map from a given search result to all presentations containing that slide; this is open issue 33 ( https://github.com/RedHatUKI/satools/issues/33 ) if you'd like to take a look at it ;)