gcv / julia-snail

An Emacs development environment for Julia
GNU General Public License v3.0
231 stars 21 forks source link

Unicode span byte fix #82

Closed SrAceves closed 2 years ago

SrAceves commented 2 years ago

julia-snail--cst-block-at returns incorrect block spans when the string includes unicode characters. It seems to me that a the byte fix you apply to print_cst can also fix the pathat function spans.

gcv commented 2 years ago

I think I understand what you’re talking about, but a couple of simple test cases I tried seem to work correctly. Can you give me an exact reproducible test case, along with both the current and expected outputs for julia-snail--cst-block-at?

SrAceves commented 2 years ago

My case involves lots of code not really relevant; but here's a simple example that reflects the issue:

foo contains 25 characters:

function foo(a)
    a
end

If I evaluate the expression (julia-snail--cst-block-at (current-buffer) (point)) in the minibuffer -- where Point is, say next to a -- returns the expected character span (offset by 1) of 26:

(nil 1 26 "foo")

Repeating the same for bar (same number of characters):

function bar(α)
    α
end

(julia-snail--cst-block-at (current-buffer) (point)) returns 28:

(nil 1 28 "bar")

Consistent with the two αs in the code.

gcv commented 2 years ago

Right. As the documentation comment of pathat says, the returned locations represent bytes, not character locations. On the Elisp side, the requested location of (point) is converted to bytes using position-bytes. When the spans' character locations are important, they're converted using byte-to-position. See julia-snail-send-top-level-form for the only place where Snail does this (I think). There's perhaps an argument for moving the call to byte-to-position into julia-snail--cst-block-at, but I don't think it matters to Snail's current codebase.

So I don't see a bug here. Is julia-snail-send-top-level-form breaking for you? Then I need to see a reproducible example, since the ones you provided work fine for me. Are you doing something that relies on julia-snail--cst-block-at directly? In that case, (1) note that it's a Snail-internal function and thus subject to change, and (2) you can call byte-to-position on its output.

SrAceves commented 2 years ago

Sorry for not commenting earlier, especially because you're very attentive.

Indeed, had I looked at julia-snail-send-top-level-form (or read the doc) I would have found what I was looking for.

My use case is that I'm trying to figure out a way to better define defun in julia mode, as sometimes I try to mark a function definition with M-h it falls short of marking the whole function. So, I figure that this package functionality allows for a much more precise definition of a defun than that afforded by the regexes by the julia-mode.

I haven't figured out how to do it so far, partly because I'm still a neophyte in elisp (well, in coding in general) -- but you pointed me in the right direction.

If in the end I succeed in my endeavor, I'll share my results with you.

Thanks again for your help and congratulations on an awesome package!

gcv commented 2 years ago

@SrAceves: Thank you for the kind words.

I only minimally tested this function, but I think it does what you want:

(defun julia-snail-mark-top-level-form ()
  (interactive)
  (let* ((q (julia-snail--cst-block-at (current-buffer) (point)))
         (filename (julia-snail--efn (buffer-file-name (buffer-base-buffer))))
         (module (julia-snail--module-at-point (-first-item q)))
         (block-start (byte-to-position (or (-second-item q) -1)))
         (block-end (byte-to-position (or (-third-item q) -1))))
    (if (null q)
        (user-error "No top-level form at point")
      (push-mark block-start t t)
      (goto-char block-end))))

It’s an adaptation of julia-snail-send-top-level-form. There’s a bit of incidental complexity having to do with filename translations and extracting serialized data. It marks blocks defined by julia-snail--cst-block-at. That calls JuliaSnail.CST.blockat on the Julia side, which in turn uses CSTParser, which definitely works better than the regex approach from julia-mode. Note that blockat may not quite do what you want, depending on whether you specifically only want to mark functions (and not, e.g., structs or macros).