abo-abo / avy

Jump to things in Emacs tree-style
1.7k stars 108 forks source link

avy-jump without complete paths and overlays on top of the jump target (like in ace-jump) #5

Closed tsdh closed 9 years ago

tsdh commented 9 years ago

I've switched from ace-jump to avy-jump just recently. One major difference is that ace-jump's overlays are displayed on top of the jump target and only one jump character is displayed at a time. With avy-mode, the overlays always show the complete jump character sequence and are displayed right of the jump target.

At least my personal preference (which is not ace-jump biased too much; I started using that only two weeks ago before switching to avy-jump yesterday) is that I like the ace-jump behavior better because the buffer contents won't switch to the right. That is, if you have 5 matches on a line, each requiring a 2-char key sequence to jump to, then the last match has switched 8 chars to the right. With ace-jump, it stays where it has been before. So this "fix target with your eyes and then jump" thingy works better.

abo-abo commented 9 years ago

It's not yet customizable, but it should be soon. There are 3 styles:

The first one is the currently used one that I like. The last one is ace-jump style. I'll make it customizable soon. Like this probably.

tsdh commented 9 years ago

Oh yes, that'd be great!

PythonNut commented 9 years ago

@abo-abo there is (and you may already know this) potentially a way to do multi-char targets that also overlay. The math is described on the May 19th posts here. In essence, we allow the targets to overlap and do extra math to make sure that all possible intervals on the combined targets are unique. This is the method used by vim-easymotion versions 2+.

Does that sound at all feasible?

abo-abo commented 9 years ago

Does that sound at all feasible?

Thanks for the pointer, I'll have to look into this. At the moment, I think that overlaying multiple chars would quickly result in a loss of context, since a large portion of the original text would be hidden.

PythonNut commented 9 years ago

@abo-abo that may be true, but I doubt this is better:

evil-easymotion-avy

Targets are so far displaced from their original positions that it's decently hard to follow where they went. I, personally, rely on remembering where the target was originally and just mash whatever shows up so I don't have time to forget. I could just use the at stye, if this doesn't work out, though.

tsdh commented 9 years ago

Me, too. I fixate the position I wanna jump to and then just type whatever shows up. So I'd :+1: @PythonNut's method.

abo-abo commented 9 years ago

but I doubt this is better:

Would it be better with three consecutive chars overlayed on each target? You wouldn't be able to see the original text at all.

tsdh commented 9 years ago

@abo-abo I think yes. I fix the jump position, then invoke avy-goto-char, and just type. The original text is not important anymore at that point.

abo-abo commented 9 years ago

The original text is not important anymore at that point.

OK, we'll just have to implement and see.

arejensen commented 9 years ago

I came to suggest exactly the same thing as @PythonNut and @tsdh. I attempted to implement the feature with ace-jump, but quickly grew frustrated when disentangling the code. When I saw your initial work on avy, I had great hope that you would implement this feature. Thank you for your work.

vermiculus commented 9 years ago

:+1: It looks like there's a small bug where it picks up the newline in the overlay if the query char is at the end of the line.

abo-abo commented 9 years ago

Thanks, I see it now.

tsdh commented 9 years ago

Wow, this is awesome! Thanks @abo-abo!

PythonNut commented 9 years ago

Does at-full have no effect on the actual target sequences (i.e. kk, kj,...)?

abo-abo commented 9 years ago

@PythonNut at-full is only the overlay style. The target sequences are determined by avy-keys or avy-keys-alist.

PythonNut commented 9 years ago

So if the targets are adjacent, the text is shifted one character to the right?

abo-abo commented 9 years ago

Yes.

tsdh commented 9 years ago

Indeed. Hm, that makes the at-full style useless if there are matches whose distance between each other is less or equal than the length of the key sequence for the former. A user cannot see at which position the sequence he has to type actually begins.

To implement the algorithm @PythonNut referenced, avy-tree and its helpers would need to assume that the list given to it is actually a list of markers because the algorithm needs to know which matches are adjacent.

I wonder if there's even some "better" algorithm where better means no restriction on number of adjacent matches but possibly longer key sequences. Basically, what needs to be ensured is that the key sequence assigned to match N+1 shares its front keys with match N if the key sequence of that runs into match N+1.

Basically, that's simple when doing by hand, but I don't see the math behind it right now. Here's an example where the matches are ?x characters, and avy-keys are just ?a and ?b.

xxxxxZx
aa       ;; 1. match
 ab      ;; 2. match
  ba     ;; 3. match
   aab   ;; 4. match
    abb  ;; 5. match
      bb ;; 6. match

Can anyone formalize that?

abo-abo commented 9 years ago

I could add an overwrite check, although the best solution to the problem is just to use pre, it seems to me that at-full can't be solved perfectly.

avy-tree algorithm is sound, and a good part of it is that it has no idea about the type of the candidates' set or the decision set, it only cares about the sizes of these sets. And it's optimal in the tree depth and leafs amount.

You can try to implement a less optimal algorithm in the two mentioned parameters, that instead resolves overlay conflicts. But I expect that the result will not be very consistent: the leading chars structure will vary depending on the buffer text. The current sequence is actually pretty consistent, since it's basically increasing in lexicographic order by 1 each time.

abo-abo commented 9 years ago

avy-at-full

See, for instance, this example with the word that. Sequences of length 2 are used there, which is reasonable, and it's very had to see the original text.

tsdh commented 9 years ago

Yes, of course you are right. But still it's a very interesting problem to me, and if there is a general solution probably resulting in longer key sequences and of course in non-lexicographic ordering, I'd still want to try it out. Especially the non-lexicographic order is a non-issue anyway because when you jump, you know where you want to end up so you don't care about the preceding or following matches and thus not about their key sequences. Well, and if you don't use the full alphabet as avy-keys lexicographic order is strange anyhow.

I'll try to ask some math guy. Maybe there is a good solution that could be implemented in a special at-full specific alternative avy-tree implementation.

abo-abo commented 9 years ago

Suppose that the buffer is filled with 15 occurrences of the word eye, and avy-goto-char is called for e. To avoid problems with at-full style, a single length path would need to be used in these 30 places. Which means that you can avy-keys must be at least longer than 30. So you can't just cover all cases in a perfect way.

Also note that in the screenshot, it's not all that bad: the first decision char is always in the proper position, and I can make sure that it's not overwritten. And thanks to the proper sequencing, you can easily type the prefix to narrow the candidates (since candidates with same prefix are in the same area).

I could enhance this by coloring the first prefix with a different face.

abo-abo commented 9 years ago

avy-at-full

What do you think about the different face color for the first prefix? Is it more clear?

abo-abo commented 9 years ago

Oops, I was wrong before: the overlays don't get overwritten. at-full actually behaves like pre if there's an overlap.

tsdh commented 9 years ago

Yes, it's more clear.

I'm pretty sure that there is a general algorithm for making at-full working correctly without having to shift the text to the right when there are matches whose overlays would collide. You can pose that as a number-theoretic question where you assign a number of base (length avy-keys) to each match with the additional restriction I made above. It might be possible that this algorithm won't be efficient or will result in very long key sequences in worst-case scenarios like xxxxxxaxxxaxxxexxxoxxxx and jumping to x.

Anyway, I'll try to ask someone who might be able to help me with that. I'll report back.

tsdh commented 9 years ago

I've implemented a brute-force method (avy--jump-key-seqs) which finds jump key sequences. When there are adjacent (or nearby) matches where the key sequence for match i overlaps match i+1, then the key sequence of the latter is chosen in such a way that it starts with the overlapping part of the key sequence for match i.

I've tested it briefly, and it seems to work correctly except that I don't consider the windows of matches yet (see the FIXME). Using the default value of 2 for avy-at-full-factor and only two keys (e.g., like (setq avy-keys '(?a ?b))) it still works fine for up to 51 adjacent matches (which is a very unlikely situation), and key sequences may grow up to length 5. If one uses 6 instead of just two avy-keys, even 200 adjacent matches would be no problem. In that case, the worst key sequences would have length 4 but those are very rare. Most are of length 2 or three.

Unfortunately, it is not a real replacement for avy-tree because it returns just a list of jump key sequences and not a tree (lists of lists where each list's car is the next character to type). But I guess one could rewrite it to create a tree or make it convert the plain list to a tree.

Currently, I don't have the time to try that, so here is the code in case anyone wants to improve it.

(defun avy--key-seqs (how-many keys)
  "Return HOW-MANY key sequences composed of KEYS.
HOW-MANY is a positive integer, and KEYS is a list of characters.
The return value is a list of list of characters."
  (let* ((seqs (mapcar #'list keys))
         (l (length seqs)))
    (catch 'done
      (while t
        (dolist (ks (copy-list seqs))
          (dolist (k keys)
            (push (cons k ks) seqs)
            (cl-incf l)
            (when (>= l how-many)
              (throw 'done t))))))
    (nreverse seqs)))

(defun avy--list-starts-with-p (list prefix)
  "Return true iff LIST starts with PREFIX."
  (if prefix
      (catch 'found
        (while (= (car list) (car prefix))
          (setq list (cdr list)
                prefix (cdr prefix))
          (unless prefix
            (throw 'found t)))
        (throw 'found nil))
    t))

;; FIXME: Give it a better name
(defvar avy-at-full-factor 2)

;; FIXME: This doesn't consider the window of matches but only their position.
(defun avy--jump-key-seqs (matches keys)
  "Return a list of lists of KEYS where the n-th list of keys is
the sequence of keys need to be typed to jump to the n-th match
in MATCHES.  If the list of keys (which will be displayed in an
overlay) of match i overlaps with match i+1, then the prefix of
the list of keys for match i+1 will equal the overlapping suffix
of the list of keys of match i."
  (let ((all-seqs (avy--key-seqs (* avy-at-full-factor
                    (length keys)
                    (length matches)) keys))
        prev-pos prev-seq jump-seqs)
    (while matches
      (let* ((cur (car matches))
             (pos (caar cur))
             (seq (if prev-pos
                      (catch 'found
                        (let ((overlap (- pos prev-pos))
                              (ns all-seqs))
                          (while (and ns
                                      (not (avy--list-starts-with-p
                                            (car ns) (seq-subseq prev-seq overlap))))
                            (setq ns (cdr ns)))
                          (if ns
                              (throw 'found (car ns))
                            (error "Ouch! No at-full key seq can be found."))))
                    (car all-seqs))))
        (push seq jump-seqs)
        (setq prev-pos  pos
              prev-seq seq
              matches (cdr matches)
              all-seqs (delete seq all-seqs))))
    (nreverse jump-seqs)))
tsdh commented 9 years ago

Ah, that's of course flawed because it will assign key sequences "a" and "aa", and then we don't know if should stop after reading an "a" or resume reading another char.

abo-abo commented 9 years ago

Yes, no full path can be a prefix or any other path. See http://en.wikipedia.org/wiki/Huffman_coding

PythonNut commented 9 years ago

Yes, and I suspect that will be the trouble with sequences like xxxxxxxxxxx.... The problem with the old method is that you get things like this:

xxxxxxxxxxxxxxxxxxxxxxxxx
abcdefghijklmnopqrstuvwxyz

But this scheme eats the entire alphabet of prefixes, and obviously won't scale—it can't even be extended.

Here's a slightly denser sequence.

xxxxxxxxxxxxxxxxxxxxxxxxx
aabbaccbcaddcdbdaeedecebea

Here, 25 adjacent points are covered by only 5 chars. (Spanning all possible pairs, in fact)

It's easier to see the structure if I break it down:

  1. aa
  2. bba
  3. ccbca
  4. ddcdbda
  5. eedecebea

This is a crazily fun problem. I'm not yet sure how to generalize this to n character targets, or even if this is possible (sure, we can get close, but can we hit all nⁿ combos?).

PythonNut commented 9 years ago

I tired out at four characters.

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
aaababbbaacacbacccbcbbccabcaadadddbdbbddcdccddabdcbdacdbadcadbcdaa

100% density @ 64 adjacent jump points.

This one is harder to see: Easier to just break into the triplets and group:

aaa
# AB
aab aba bab
abb bbb bba
baa
# AC
aac aca cac
# ABC (parity 1)
acb cba bac
# AC
acc ccc
# BC
ccb cbc bcb
cbb bbc bcc
# AC
cca
# ABC (parity 2)
cab abc bca
# AC
caa
# AD
aad ada dad
add ddd
# BD
ddb dbd bdb
dbb bbd bdd
# CD
ddc dcd cdc
dcc ccd cdd
# AD
dda
# ABCD minus ABC
dab abd bdc
dcb cbd bda
dac acd cdb

dba bad adc
dca cad adb
dbc bcd cda
# AD
daa

But I'm not sure how the parities extend, but it looks like there's two for odd characters and none for even.

If nobody figures it out, I can start working on a program to compute these sequences... later...

abo-abo commented 9 years ago

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx aaababbbaacacbacccbcbbccabcaadadddbdbbddcdccddabdcbdacdbadcadbcdaa

I have no idea what this is.

PythonNut commented 9 years ago

It's a sequence with the property that all sub-sequences of length 3 are unique.

This is one of the possible sequences you would use if you wanted to jump to, say = in this:

;; ==============================
;; this is not very jump-friendly
;; ==============================

And do so without moving the text.

To do this, you would need to compute adjacent targets that overlap, while ensuring that they follow Huffman coding. That particular sequence handles the case when character pairs are not enough (say, when avy-keys is small). Here, let's pretend that (setq avy-keys (list ?a ?b ?c ?d))

We have 60 possible positions, and four characters. 4² = 16 combinations is not enough, but 4³ = 64 is. We need non-conflicting three character targets.

Just before jumping, you'd get something like:

;; aaababbbaacacbacccbcbbccabcaadad
;; this is not very jump-friendly
;; ddbdbbddcdccddabdcbdacdbadcadbcd

Which is just the sequence overlayed.

Now try it. Choose an equal sign. You can jump to any one you please unambiguously.

I admit, that particular sequence looks to be pretty hard to generate procedurally, but it does exist. and it does extend to an arbitrary number of possible characters. With the current avy-keys default, you should be able to jump to 729 targets, which should be enough for anyone.

vermiculus commented 9 years ago

See also http://en.wikipedia.org/wiki/De_Bruijn_sequence.

That's a damned clever idea.

PythonNut commented 9 years ago

Basic translation to Emacs Lisp:

(eval-when-compile (require 'cl-lib))
(defun de-bruijn (k n)
  "De Bruijn sequence for alphabet k and subsequences of length n."
  (let ((a (make-list (* n k ) 0))
         sequence
         (db (lambda (T p)
               (if (> T n)
                 (if (eq (% n p) 0)
                   (setq sequence
                     (append sequence
                       (cl-subseq a 1 (1+ p)))))
                 (setf (nth T a) (nth (- T p) a))
                 (funcall db (1+ T) p)
                 (cl-loop for j from (1+ (nth (- T p) a)) to (1- k) do
                   (setf (nth T a) j)
                   (funcall db (1+ T) T))))))
    (funcall db 1 1)
    sequence))

Feel free to optimize/stylize...

PythonNut commented 9 years ago

@abo-abo currently, at-full leaves some targets unreachable. (Note adjacent blue targets, and how one shadows the other). avy_unreachable

I'm really excited. If at-full grows the necessary sequencing, then it can supersede all three other styles. It costs one mind-keyboard round trip like pre and post, zero scans for shifted content like at, and is also precise (as in, it doesn't come before or after the target, it goes over the target) like at. It's seriously the best of all the other styles in one.

Currently, the math is done. The only addition I would add is the case when two targets are not adjacent, but still overlap, as would be the case jumping to * here with a target length of three.

*=*=*=*=*=*

In this case, you can just skip forward n entries (where n is the total number of chars between the targets) and resume the sequence.

*=*=*=*=*=*
aaa
 aab          ⇐ ignored
  aba
   bab        ⇐ ignored
    abb
     bbb      ⇐ ignored
      bba
       baa    ⇐ ignored
        aac
         aca  ⇐ ignored
          cac
final result: (caps indicate highlighted char)
*=*=*=*=*=*
AaAbAbBbAaCac
vermiculus commented 9 years ago

I love De Bruijn sequences, but I'll say it: I'm confused.

PythonNut commented 9 years ago

@vermiculus okay, maybe I need to say more.

Suppose that our targets do not completely overlap (which can be the case if the subsequence length is >2).

*=*
???   Target 1
  ??? Target 2

This would normally be a problem, because the order 3 De Bruijn overlaps by two chars, not one.

We can fix this by adding dummy target to the group:

*=*
aaababb Master Sequence
aaa   Target 1
 aab  (dummy)
  aba Target 2

And we just drop it later, to get:

*=*
aaa   Target 1
  aba Target 2
*=*
aaaba (highlighted as)
Target 1 path: aaa
Target 2 path: aba

Imperatively, we "skip" over the dummy when we detect a partially overlapping target, which is what I was clumsily trying to describe. Typing aab, which would have taken you to the =, is voluntarily and specifically not part of the scope of valid inputs.

It's really pretty dumb. I should probably just have left it to the implementer to discover.

Does that help at all?

abo-abo commented 9 years ago

Note adjacent blue targets, and how one shadows the other

I'm aware of this, I think it's a feature: it's like saying in bold that if you type this letter, the corresponding area will become active.

abo-abo commented 9 years ago

Currently, the math is done

Well, I'd love to see an implementation. So far, De Bruijn are only theoretically good. I wonder how it will look in practice.

tsdh commented 9 years ago

Do I see it correctly that De Bruijn is very similar to (but a bit more space-efficient to compute) just using the cartesian product of avy-keys^n for paths of length n? That is, you'd compute De Bruijn for avy-keys and path length n, and then any subsequence of length n can be used as avy path.

In both cases, you have to walk the matches and pick non-conflicting paths when there are overlaps. And it might turn out that De Bruijn/cartesian product for paths of length n (which produce avy-keys^n different paths) don't turn out to be enough for the set of matches because many aren't usable due to overlaps. In this case, the algorithm would need to try again with n+1. So in the usual case, the algorithm would first try 1-char paths, fail, try 2-char paths, fail, try 3-char paths, and that has a reasonable chance of being good enough. (Of course that depends on the number of chars in avy-keys.

PythonNut commented 9 years ago

@tsdh set of subsequences produced by De Bruijn is the same as the cartesian product (that is, all possible combinations are present). However, the ordering is different. The ordering matters very much because, if we we choose correctly, we don't have to

[...] walk the matches and pick non-conflicting paths when there are overlaps. [...]

Suppose we want to jump to * in this: [also pretend that avy-keys is {a, b}]

2 * 2 == pow(2, 2) * 1;
char** a = 1 * 2;

(for convenience, I'll number the *s from 1 to 5)

Under cartesian allocation

  1. aaa
  2. aab
  3. aba
  4. abb
  5. baa

This is very similar to the algorithm that avy currently uses. And, as you can see, targets 3 and 4 already conflict:

char** a = 1 * 2;
    aba
     abb
             baa
chara??b = 1 abb;
    ^^       ^

We'd need some overlap resolution algorithm to resolve a new unique path.

Under De Bruijn allocation

  1. aaa
  2. aab
  3. aba
  4. bab
  5. abb The sequence starts out similar, but quickly diverges. You might think that targets 3 and 4 are still a problem, but if you look closely.
char** a = 1 * 2;
    aba
     bab
             abb
charabab = 1 abb;
    ^^       ^

As you can see, there is no conflict. In fact, that's the magic of De Bruijn. All conflicts are resolved before they even have a chance to happen.

it might turn out that [...] don't turn out to be enough for the set of matches because many aren't usable due to overlaps

De Bruijn is dense. In fact, it covers the entire space of paths (all possible combinations appear in the sequence). It is, essentially, optimal. The only exception is the case I highlight in my last post, which could bring the efficiency down by a factor of a-1, but that worst case is extremely unlikely. And, in any case, we are still bounded by the total number of chars in the window, which is usually within a four char path at worst.

tsdh commented 9 years ago

Yes, I got that. So a*b*c*d sequences (assuming you want to jump to *) are the worst case for paths of length 3 because here you can only use every second subsequence of De Bruijn (actually the half of the subseqs + 1, IIUC, i.e. 4 out of 7 for db(2,2)).

BTW, I don't think this is such an overly rare case. For example, the letter e occurs very often in English or German, and frequently the distance between two consecutive e is exactly 2. (bejeweled, telemeter, epexegesis, antecedence, decedent, reference, ...)

So then you have the choice: either you assign sequences linearly in one go which is fast but a significant portion of subsequences will be skipped, or you remember the skipped sequences and try to assign them later at places with no overlaps. The second approach is a bit more complex (but not really hard) and has the benefit that chances are higher that you don't have to fall back to longer paths.

tsdh commented 9 years ago

Ok, I think I've implemented a good part of the stuff.

(defun avy--de-bruijn (keys n)
  "DeBruijn seq for alphabet `keys' and subseqs of length `n'."
  (let* ((k (length keys))
         (a (make-list (* n k) 0))
         sequence)
    (cl-labels ((db (T p)
                    (if (> T n)
                        (if (eq (% n p) 0)
                            (setq sequence
                                  (append sequence
                                          (cl-subseq a 1 (1+ p)))))
                      (setf (nth T a) (nth (- T p) a))
                      (db (1+ T) p)
                      (cl-loop for j from (1+ (nth (- T p) a)) to (1- k) do
                               (setf (nth T a) j)
                               (db (1+ T) T)))))
      (db 1 1)
      (mapcar (lambda (n)
                (nth n keys))
              sequence))))

(defun avy--starts-with-p (list prefix)
  "Return true iff LIST starts with PREFIX."
  (if prefix
      (catch 'found
        (while (= (car list) (car prefix))
          (setq list (cdr list)
                prefix (cdr prefix))
          (unless prefix
            (throw 'found t)))
        (throw 'found nil))
    t))

;; FIXME: This doesn't consider the window of matches yet but only their
;; position.
(defun avy--path-alist-1 (matches seq-len keys)
  (let ((db-seq (avy--de-bruijn keys seq-len))
        prev-pos prev-seq path-alist)
    ;; The De Bruijn seq is cyclic, so append the seq-len - 1 first chars to
    ;; the end.
    (setq db-seq (nconc db-seq (cl-subseq db-seq 0 (1- seq-len))))
    (cl-labels ((subseq-and-pop ()
                                (when (nth (1- seq-len) db-seq)
                                  (prog1 (cl-subseq db-seq 0 seq-len)
                                    (pop db-seq)))))
      (while matches
        (let* ((cur (car matches))
               (pos (caar cur))
               (path (if prev-pos
                         (let ((diff (- pos prev-pos)))
                           (when (and (> diff 0) (< diff seq-len))
                             (while (and (nth (1- seq-len) db-seq)
                                         (not (avy--starts-with-p
                                               (cl-subseq db-seq 0 seq-len)
                                               (cl-subseq prev-seq diff))))
                               (pop db-seq)))
                           (subseq-and-pop))
                       (subseq-and-pop))))
          (if (not path)
              (setq matches nil
                    path-alist nil)
            (push (cons path (car matches)) path-alist)
            (setq prev-pos  pos
                  prev-seq path
                  matches (cdr matches))))))
    (nreverse path-alist)))

(defun avy--path-alist (matches keys)
  "Returns as alist with entries (PATH . MATCH)."
  (let* ((seq-len 1)
         (path-alist (avy--path-alist-1 matches seq-len keys)))
    (while (not path-alist)
      (cl-incf seq-len)
      (setq path-alist (avy--path-alist-1 matches seq-len keys)))
    path-alist))

(avy--path-alist matches avy-keys) returns an alist where the entries have the form (PATH . MATCH) and there are (length matches) entries of course.

What remains to be done is to convert this alist into the tree representation of avy-tree. Ah, and avy--path-alist-1 should also consider the window of matches although that's not really important because when its not considered, the only difference is that if the last match in w1 is at position 99 and the first match in w2 is at position 101, there is a chance that a De Bruijn subseq gets skipped because the algorithms thinks there is an overlap which of course isn't there.

PythonNut commented 9 years ago

Strange, I finished this just last afternoon.

(eval-when-compile (require 'cl-lib))

(defmacro avy-multipop (lst n)
  "Remove LST's first N elements and return them."
  `(if (<= (length ,lst) ,n)
     (prog1 ,lst
       (setq ,lst nil))
     (prog1 ,lst
       (setcdr
         (nthcdr (1- ,n) (prog1 ,lst (setq ,lst (nthcdr ,n ,lst))))
         nil))))

(defun avy-de-bruijn (k n terms)
  "De Bruijn sequence for alphabet `k' and sub-sequences of length `n', for a
minimum of `terms' terms"
  (catch 'return
    (when (= k 0)
      (throw 'return (vector)))
    (let ((a (make-vector (* n k ) 0))
           (sequence (vector))
           (db (lambda (T p)
                 (if (> T n)
                   (when (eq (% n p) 0)
                     (setq sequence
                       (vconcat sequence
                         (cl-subseq a 1 (1+ p))))
                     (when (< (+ terms n) (length sequence))
                       (throw 'return sequence)))
                   (setf (elt a T) (elt a (- T p)))
                   (funcall db (1+ T) p)
                   (cl-loop for j from (1+ (elt a (- T p))) to (1- k) do
                     (setf (elt a T) j)
                     (funcall db (1+ T) T))))))
      (funcall db 1 1)
      (throw 'return sequence))))

(defun avy-partition (k n)
  "The optimal block sizes for alphabet k and number of targets n.
k is partitioned into De Bruijn blocks that cover n with minimal depth."
  (let* ((max-order (ceiling (log n k)))
          (var (vconcat
                 (make-vector (1- max-order) 0)
                 (list k)))
          (i (1- (length var))))
    (catch 'return
      (while t
        (if (eq (elt var i) 0)
          (progn
            (when (eq i 0)
              (throw 'return var))
            (cl-decf i))
          (cl-decf (elt var i))
          (when (/= i 0)
            (cl-incf (elt var (1- i))))
          (when (< (apply #'+
                     (cl-mapcar #'expt var
                       (number-sequence 1 (length var))))
                  n)
            (cl-incf (elt var i))
            (cl-decf (elt var (1- i)))
            (if (= i 0)
              (throw 'return var))
            (cl-decf i)))))))

(defun avy-tree* (lst keys)
  (let ((targets-needed 0)
         (last-target-point 0)
         (depth-guess (ceiling (log (length lst) (length keys)))))
    ;; compute the total number of targets needed, including edge cases
    (mapc (lambda (loc)
            (let* ((pos (caar loc))
                    (delta (- pos last-target-point)))
              (if (< delta depth-guess)
                (cl-incf targets-needed delta)
                (cl-incf targets-needed 1))))
      lst)
    (let ((partitions (avy-partition (length keys) targets-needed))
           (keys-left keys))
      ;; compute the de-brujin blocks
      (setq partitions
        (cl-mapcar
          (lambda (k n)
            (let* ((block-keys (if (/= k 0) (avy-multipop keys-left k)))
                    (seq (mapcar (lambda (char)
                                   (elt block-keys char))
                           (avy-de-bruijn k n targets-needed)))
                    (result))
              seq
              (cl-loop for i from 0 to (1- (- (length seq) n)) do
                (setq result (cons (cl-subseq seq i (+ i n)) result)))
              (nreverse result)))
          partitions
          (number-sequence 1 (length partitions))))
      partitions)))

We're pretty much in the same place, so I'll let you continue the development. The major differences are:

  1. I switched to vectors, because O(n) nth was bothering me philosophically.
  2. I have the concept of blocks, which are independently computed sequences of different path lengths (over independent subsets of avy-keys).
1 2  3    Block #
  .. ...
a df ggi
b de ggh
c dd ggg

This is so we can have a worst case path length, but still give some targets a better path length. I don't have a closed form solution for the optimal block sizes, so I use gradient descent instead.

I'm no lisp expert. Python I can do, Elisp, not so much. I'll let you continue the development.

tsdh commented 9 years ago

@abo-abo @PythonNut I've pushed my changes to a separate branch of my fork https://github.com/tsdh/avy-jump/tree/de-bruijn.

I think the actual calculation and tree-building is correct. What's missing is the UI/overlay changes, and I'd welcome if @abo-abo could give this a whirl. Basically, the at-full style should now be able to fully write out each path, because if there is some overlap the overlapping parts are equal anyway.

PythonNut commented 9 years ago

@tsdh the math seems to be working out. I have a hard time telling because of the way the overlays are implemented.

@abo-abo How difficult would it be to fix at-full for this?

abo-abo commented 9 years ago

@abo-abo How difficult would it be to fix at-full for this?

Hard to say. There aren't any test cases, so I don't know if it's working at all. Plus the avy-tree was rewritten to a different data type. Since it doesn't fit the current interface, I can't check if it works correctly or at all.

tsdh commented 9 years ago

@abo-abo The avy-tree function in the branch https://github.com/tsdh/avy-jump/tree/de-bruijn returns a tree of the same format as it is right now. So it should theoretically be a drop-in replacement for the master branch's avy-tree function.

abo-abo commented 9 years ago

So it should theoretically be a drop-in replacement for the master branch's avy-tree function.

It wants to take caar of MATCHES, if I understood correctly. That doesn't fit the current interface.

tsdh commented 9 years ago

It assumes that MATCHES is a list of matches where each match has the form ((start-pos . end-pos) . window). With caar it (or rather avy--path-alist-1) accesses the start-pos of a match. Isn't that a valid assumption?