Cursor position using line and column number

jacob-carlborg commented 8 years ago

Currently DCD is using bytes to specify the cursor position. Is it possible to add support for specifying the cursor position using line and column number? The reason is because TextMate only gives this information using line and column number.

I've already asked the author of TextMate about supporting byte offset and the reply was that the byte offset is not reliable since the editor can use a different encoding and line endings than the file on disk.

Hackerpilot commented 8 years ago

DCD's lexer only supports UTF-8 and doesn't do any character width calculations. It also doesn't care or need to know if the user set the tab character to be equal to 11 spaces. I don't see why it's DCD's job to figure out the internal state of a text editor instead of having the editor understand itself.

jacob-carlborg commented 8 years ago

I'm bringing in the author of TextMate to discussion. @sorbits this is the enhancement request I filed after our discussion of supporting accessing cursor position using byte offsets in TextMate. @Hackerpilot is the author of DCD, the tool I'm using to do the code competition and other features.

sorbits commented 8 years ago

I don't see why it's DCD's job to figure out the internal state of a text editor

Just to be clear, this is not about TextMate’s internal state, but rather a suggestion to follow the convention that (AFAIK) has been used for decades. I can name a dozen command line tools that all report issues using the line number (optionally followed by column) and there is likewise widespread support for dealing with “lines” in command line tools and text editors alike, both programmatically and interactively.

Prior to hearing about DCD I knew only of osascript as a tool that reports issues using byte offsets (though I do not know if that is still the case), and I would be hard pressed to come up with tools that can use these byte offsets.

For line number you should only need to care about counting LF characters in the file (assuming you do not care about legacy Mac files that use CR).

For column offset, I would be pragmatic and count bytes from the last LF seen, I do not expect a command line tool to deal with tab characters, diacritics, or care about multi-byte code sequences.

jacob-carlborg commented 8 years ago

@sorbits just to be clear, DCD is not used for reporting errors. The byte offset is used to indicate where it should figure out which auto complete options are available. Example:

class Foo
{
    void bar() {}
}

void main()
{
    auto a = new Foo;
    a. // cursor is located here, DCD will give back all the auto complete options, "bar" in this case
}

It's also used for go-to-definition. Where it reports the definition of a symbol using file path and byte offset.

BTW, the Go bundle has the same problem with the Go tools which also works with byte offset: https://github.com/syscrusher/golang.tmbundle/blob/master/Commands/Complete.tmCommand#L38

sorbits commented 8 years ago

I may have misunderstood the workflow then.

So we need to give DCD a byte offset (corresponding to our insertion point)?

Does DCD read the current source file from stdin or from disk? I assume the former so that we do not need to save first, incase we have local changes (as we most likely would).

On 9 Aug 2016, at 10:23, jacob-carlborg wrote:

@sorbits just to be clear, DCD is not used for reporting errors. The byte offset is used to indicate where it should figure out which auto complete options are available. Example:
class Foo
{
    void bar() {}
}

void main()
{
    auto a = new Foo;
    a. // cursor is located here, DCD will give back all the auto 
complete options, "bar" in this case
}
It's also used for go-to-definition. Where it reports the definition of a symbol using file path and byte offset.

BTW, the Go bundle has the same problem with the Go tools which also works with byte offset: https://github.com/syscrusher/golang.tmbundle/blob/master/Commands/Complete.tmCommand#L38

You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/Hackerpilot/DCD/issues/342#issuecomment-238486388

jacob-carlborg commented 8 years ago

So we need to give DCD a byte offset (corresponding to our insertion point)?

Yes. But there are two features here:

Code completion, requires byte offset input
Go to definition, requires byte offset input (the symbol to look for) and returns byte offset (where the symbol was found)

So if this is implemented in TextMate I would need both a new environment variable, similar to TM_LINE_INDEX and that the mate commands support cursor position as byte offset.

Does DCD read the current source file from stdin or from disk? I assume the former so that we do not need to save first, incase we have local changes (as we most likely would).

It can do both. But I'm using stdin for the reason you mentioned.

ghost commented 8 years ago

I'm also pro line+column in all the DCD _answers_. There's no encoding issue there. Even if an editor converts the file to decoded two bytes chars (àla windows unicode), it will always understand what is a column.

Tabulations are neither an issue. I'm sure all the editor have something like PhysicalCaret in addition to their Caret to make the difference between the actual position and the position after interpretation (e.g a common interpretation is 1 tab = 4 spaces).

One argument for this is that libdparse has the line and columns information for all the AST nodes, as advocated in #199

sorbits commented 8 years ago

So if this is implemented in TextMate I would need both a new environment variable, similar to TM_LINE_INDEX and that the mate commands support cursor position as byte offset

OK, I did not realize that DCD wanted the byte offset as input. Since we pass data to the tool (via stdin), it is reasonable for DCD to ask for byte offset and we can add a variable for this (I’ll follow up on the ML).

I’ll second @BBasile comment though that I think for answers using line:column is the common format, but I can also see the authors of DCD preferring byte offsets for consistency.

Hackerpilot commented 8 years ago

Giving answers in line:column has the same problems as completion requests: Editors often track column numbers in multiples of displayed character width. For example, Scintilla tracks multi-byte characters and tab widths when dealing with column counts: http://www.scintilla.org/ScintillaDoc.html#SCI_GETCOLUMN. The "column" number recorded by the lexer in DCD is actually a byte offset within the line and thus reports that a tab is the same width as a space.

jacob-carlborg commented 8 years ago

Does the width of the tab matter? A tab is one character, regardless of the editor render it as 4 or 8 spaces. It's not like one can put the cursor in the middle of a tab character.

Hackerpilot commented 8 years ago

You're missing the point. For something like Textadept, SciTE, or Nodepad++ a D plugin could ask Scintilla for the current line and column numbers when the cursor is at the end of a line that has a single tab character on it and get different answers depending on the tab width. Sublime seems to have similar behavior.

jacob-carlborg commented 8 years ago

Hmm, that seems like a weird behavior, or am I still missing the point 😃?

sorbits commented 8 years ago

For something like Textadept, SciTE, or Nodepad++ a D plugin could ask Scintilla for the current line and column numbers

Note that we are (now) only talking about answers from DCD. So DCD would output line:column where column is in bytes (from start of line). This is consistent with existing command line tools like compilers, linters, search commands, etc.

The editors you mention should be able to accept a line:column from an external tool like gcc and jump to the correct position, as no external tool that I know of will expand tab characters or similar.

dlang-community / DCD

Cursor position using line and column number #342