Open erinpentecost opened 5 years ago
The golang.org/x/net/html package already has an option mechanism, an option could be added to track the position. @bcmills @bradfitz How about this as a design:
ParseOptionPosition() ParseOption
function is added.Position
is added, with three fields: Line, Column, Offset int
(Line and Column are numbered from 1 in the usual convention, Offset is a 0-based byte offset into the original input)Position Position
field is added to both Token and Node.I also have an another option I'd like to add which would be to disable the lower casing of element and attribute names. This could be added using a ParseOptionPreserveCase()
function, no other type changes would be needed.
These changes seem fairly low impact. They would not break existing users of the package. And if not used would only add a relatively few bytes to the Token and Node structs. (And if space is a concern, the field could be defined as Position *Position
to be smaller in memory for the case where it's not used) We'll need to see but performance impact should be fairly minimal when these features are enabled, and should be no different from existing code if not used.
I will probably end up putting together a prototype of this as I need this case preservation option (and the line numbers would definitely be nice) for some functionality being added to github.com/vugu/vugu - any feedback on the approach would be greatly appreciated, so we improve the possibility of getting the changes merged back in at some point.
I wanted to get the line and column for the current Token, which forced me to fork the package.
I'm not sure if a fork is necessary. The Tokenizer takes an io.Reader, and that io.Reader can build the mapping from byte offset to line:col numbers. If I understand https://github.com/golang/go/issues/34302 correctly, its LineCounter does exactly that.
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
I wanted to get the line and column for the current Token, which forced me to fork the package. The key change is adding something like the following to the end of readByte() in token.go:
What did you expect to see?
I'd like to see a public method on Tokenizer that returns the starting and ending line/column of the current Token. The method could return a new struct with these four values.
What did you see instead?
There isn't a way to figure out where the token is in the input aside from byte offset. I could feed that byte offset into user code to determine where the line/column is, but then I'd have to parse the input twice and build up that lookup table first.