belambert / cl-edit-distance

A Common Lisp implementation of edit distance.
Creative Commons Attribution 4.0 International
5 stars 0 forks source link

control the weights #9

Open arademaker opened 2 years ago

arademaker commented 2 years ago
("a seat for one person, with a support for the back; “he put his coat over the back of the chair and sat down” ; "
  "a seat for one person, with a support for the back; \"he put his coat over the back of the chair and sat down\""
  ((:MATCH #\a #\a) (:MATCH #\  #\ ) (:MATCH #\s #\s) (:MATCH #\e #\e)
   (:MATCH #\a #\a) (:MATCH #\t #\t) (:MATCH #\  #\ ) (:MATCH #\f #\f)
   (:MATCH #\o #\o) (:MATCH #\r #\r) (:MATCH #\  #\ ) (:MATCH #\o #\o)
   (:MATCH #\n #\n) (:MATCH #\e #\e) (:MATCH #\  #\ ) (:MATCH #\p #\p)
   (:MATCH #\e #\e) (:MATCH #\r #\r) (:MATCH #\s #\s) (:MATCH #\o #\o)
   (:MATCH #\n #\n) (:MATCH #\, #\,) (:MATCH #\  #\ ) (:MATCH #\w #\w)
   (:MATCH #\i #\i) (:MATCH #\t #\t) (:MATCH #\h #\h) (:MATCH #\  #\ )
   (:MATCH #\a #\a) (:MATCH #\  #\ ) (:MATCH #\s #\s) (:MATCH #\u #\u)
   (:MATCH #\p #\p) (:MATCH #\p #\p) (:MATCH #\o #\o) (:MATCH #\r #\r)
   (:MATCH #\t #\t) (:MATCH #\  #\ ) (:MATCH #\f #\f) (:MATCH #\o #\o)
   (:MATCH #\r #\r) (:MATCH #\  #\ ) (:MATCH #\t #\t) (:MATCH #\h #\h)
   (:MATCH #\e #\e) (:MATCH #\  #\ ) (:MATCH #\b #\b) (:MATCH #\a #\a)
   (:MATCH #\c #\c) (:MATCH #\k #\k) (:MATCH #\; #\;) (:MATCH #\  #\ )
   (:SUBSTITUTION #\LEFT_DOUBLE_QUOTATION_MARK #\") (:MATCH #\h #\h)
   (:MATCH #\e #\e) (:MATCH #\  #\ ) (:MATCH #\p #\p) (:MATCH #\u #\u)
   (:MATCH #\t #\t) (:MATCH #\  #\ ) (:MATCH #\h #\h) (:MATCH #\i #\i)
   (:MATCH #\s #\s) (:MATCH #\  #\ ) (:MATCH #\c #\c) (:MATCH #\o #\o)
   (:MATCH #\a #\a) (:MATCH #\t #\t) (:MATCH #\  #\ ) (:MATCH #\o #\o)
   (:MATCH #\v #\v) (:MATCH #\e #\e) (:MATCH #\r #\r) (:MATCH #\  #\ )
   (:MATCH #\t #\t) (:MATCH #\h #\h) (:MATCH #\e #\e) (:MATCH #\  #\ )
   (:MATCH #\b #\b) (:MATCH #\a #\a) (:MATCH #\c #\c) (:MATCH #\k #\k)
   (:MATCH #\  #\ ) (:MATCH #\o #\o) (:MATCH #\f #\f) (:MATCH #\  #\ )
   (:MATCH #\t #\t) (:MATCH #\h #\h) (:MATCH #\e #\e) (:MATCH #\  #\ )
   (:MATCH #\c #\c) (:MATCH #\h #\h) (:MATCH #\a #\a) (:MATCH #\i #\i)
   (:MATCH #\r #\r) (:MATCH #\  #\ ) (:MATCH #\a #\a) (:MATCH #\n #\n)
   (:MATCH #\d #\d) (:MATCH #\  #\ ) (:MATCH #\s #\s) (:MATCH #\a #\a)
   (:MATCH #\t #\t) (:MATCH #\  #\ ) (:MATCH #\d #\d) (:MATCH #\o #\o)
   (:MATCH #\w #\w) (:MATCH #\n #\n)
   (:DELETION #\RIGHT_DOUBLE_QUOTATION_MARK NIL) (:DELETION #\  NIL)
   (:DELETION #\; NIL) (:SUBSTITUTION #\  #\")))

I would like to control the weights to have In the end of the list above:

   ...
   (:SUBSTITUTION #\RIGHT_DOUBLE_QUOTATION_MARK #\")
   (:DELETION #\  NIL) (:DELETION #\; NIL) (:DELETION #\  NIL))

Idea?

arademaker commented 2 years ago

Maybe https://github.com/belambert/cl-edit-distance/blob/main/src/distance.lisp#L99, instead of being a hard-coded value, could get as a parameter a cost function.