FrankensteinVariorum / fv-collation

first-stage collation processing in the Frankenstein Variorum Project. For post processing and Variorum development, see our GitHub organization: https://github.com/FrankensteinVariorum
https://frankensteinvariorum.github.io/fv-collation/
GNU Affero General Public License v3.0
9 stars 2 forks source link

Wireframe / Prototype Prep #50

Closed ebeshero closed 5 years ago

ebeshero commented 6 years ago

@Rikkm would like me to pull in Levenshtein distances on each app so we can style the hotspots by hue indicating a rough measure of variance.

ebeshero commented 6 years ago

@Rikkm Simple JS + CSS classList toggle tutorial I wrote: http://dh.obdurodon.org/javascript/classListToggle.xhtml

ebeshero commented 6 years ago

@Rikkm Here's my Mitford play prototype, too: http://digitalmitford.org/ChasIpub.html

djbpitt commented 6 years ago

@ebeshero I like the Levenshtein idea! You probably already have a strategy for dealing with the following, but just in case:

  1. Levenshtein values are sensitive to the length of the strings being compared, since longer strings can be more distant from each other than shorter ones. This may mean normalizing the values according to the length of the longest string.
  2. Levenstein distance compares just two strings. I'm not sure what this might mean for finding hotspots in a collation of more than two witnesses.
ebeshero commented 6 years ago

@djbpitt Thank you! Yep—I’ll need to make all the possible comparisons of 5 things taken 2 at a time, and then choose the max of the output comparison numbers...

ebeshero commented 6 years ago

@djbpitt I am thinking about the longest-string problem. Strings will be lengthened when they contain markup, so I should find a strategy to read around that. Since I need to do this in Python anyway, it’ll probably involve the same normalization as the collation itself!