Closed ncordon closed 4 years ago
Sorry @dennwc, I was gathering strength to change the whole method to runes. Even if it is not so urgent it can be a good exercise for me to become more fluent in Go. So I am going to try doing it :smiley_cat:
uast/transformer/semantic.go, line 199 at r1 (raw file):
@ncordon Sorry, it was my mistake. I see now why this specific change works. The only suggestion would be to change `DecodeRune` to `DecodeRuneInString` to avoid conversion to `[]byte`. Also, since you feel uncomfortable with changing the code to `[]rune`, please add a `TODO` here and open an issue in SDK. I think we still need to address the same issue for the prefix (if it contains other Unicode space-like characters). I left a few notes here to point places relevant to an issue with prefixes. Those are just for the context, no action is needed for this PR.
@kuba-- I will test it once I have it all changed to runes.
@ncordon - check CI
@ncordon - check CI
The CI is failing because of the API limit @kuba--
coverage.txt, line 2 at r4 (raw file):
Please delete this file. It's probably from a code coverage tool.
Done!
go.mod, line 24 at r4 (raw file):
Unrelated change. Is it because you used pprof for debugging, or it's from `go mod tidy` with no other changes?
I have not used this. Probably it has been go mod tidy
uast/transformer/semantic_test.go, line 44 at r4 (raw file):
`"/*\t\t\u0020 ½ */"`? Also, isn't `\u0020` a space?
Changed!. Yes, it is, I took it from the unicode.IsSpace
documentation
uast/transformer/semantic.go, line 149 at r2 (raw file):
Related to the prefix issue: here we accept any Unicode character of class "space" to be a part of the prefix, so we can match some multi-byte ones as well.
Addressed
uast/transformer/semantic.go, line 181 at r3 (raw file):
Yes, if you check `strings.Index`, it will return `-1` if the index was not found. Also please remove `;`.
Changed, but I am not comfortable with this change. It obscures code IMO
uast/transformer/semantic.go, line 181 at r3 (raw file):
Changed, but I am not comfortable with this change. It obscures code IMO
The -1 thing I mean, not the ;
uast/transformer/semantic.go, line 186 at r3 (raw file):
@ncordon you are right in the sense that we will end up converting back and forth between `string` and `[]rune`. Unfortunately there is no helper in stdlib that works on an array of runes. But, I also see that the result of the `firstIndexFunc` and `lastIndexFunc` function is used to split the string, not an array of runes. So @kuba-- has a point here: instead of returning a rune offset, you can calculate the byte offset in the string, and then split the string using it.
Please note that firstIndexFunc
is also used to compute the tab. And that is made in [] rune
not bytes
uast/transformer/semantic.go, line 188 at r4 (raw file):
Please remove parentheses.
Done!
uast/transformer/semantic.go, line 220 at r5 (raw file):
Please remove parentheses.
Done!
uast/transformer/semantic.go, line 197 at r3 (raw file):
Or index in bytes, as @kuba-- mentioned in the previous comment. @ncordon should address your concern, right?
Done!
uast/transformer/semantic.go, line 186 at r3 (raw file):
My intention was - `firstIndexFunc` as an independent thingy should follow some standard and return -1. BUT in our case it's used in one place and having `-1` as an output is not handy. So what I suggested in comment(s) bellow is to return a `[]rune` (instead of an index). In other words, having ``` func firstSliceFunc(runes []rune, f func(r rune) bool) []rune { var i int for r := range runes { if (f(r)) { return runes[:i+1] } i++ } return runes[:i] ``` So, in a use case instead of doing: ``` i := firstIndex s = s[:i] ``` you can just: ``` s = firstSliceFunc(s, f) ``` and similar approach for `last...` Do you know what I mean?
Changed now! Thanks, I was not understading correctly what you meant
uast/transformer/semantic.go, line 242 at r2 (raw file):
Related to the prefix issue: here we check the `tab` which may contain multi-byte characters, but we compare bytes instead of runes.
Addressed with the changes to []rune
uast/transformer/semantic_test.go, line 44 at r4 (raw file):
I mean, the `0x20` is an ordinary space. You probably wanted something else, like any other kind of Unicode space, correct?
Changed now
uast/transformer/semantic.go, line 177 at r5 (raw file):
Please remove parentheses.
Done!
go.mod, line 24 at r4 (raw file):
Can you please also check that you don't have any temporary files? `go mod tidy` processes all files in the directory, not only ones committed to Git.
Done! Sorry, I do not know what happened there. I was using make test-coverage
which generated the coverage.txt
, but once I deleted it and did go mod tidy
again these things disappeared
driver/manifest/discovery/discovery_test.go, line 41 at r8 (raw file):
Can you please rebase? We made a proper fix for this.
Done!
uast/transformer/semantic.go, line 147 at r8 (raw file):
In Go, comments must begin with a function name: `isTabToken check if a token is a tab. A tab is defined ...`. Same for other comments.
Done!
uast/transformer/semantic.go, line 189 at r8 (raw file):
For sure.. - I suggest to `for i := len(runes) - 1; i > 0; i--`
Done!
uast/transformer/semantic.go, line 247 at r8 (raw file):
```golang var tab []rune ```
Done!
uast/transformer/semantic.go, line 251 at r8 (raw file):
We shouldn't trim tab from the first line - it's considered a `Prefix`.
True, fixed
uast/transformer/semantic.go, line 198 at r8 (raw file):
Can we reuse it? Both `findPrefix` and `findSuffix` convert the whole string this way. Either consider using `utf8.DecodeRuneInString`/`utf8.DecodeLastRuneInString` or somehow share the slice that is converted to runes.
I have changed it so that findPrefix
and findSuffix
reuse it and also the main body of the function, but certainly I would have preferred to keep the former version because of readability of the main algorithm of Split
. We can do what you consider best, but my opinion is that what you save from those 3 conversions is not worth obscuring the main body of the Split
method, and the algorithm is still linear in the length of the string
driver/manifest/discovery/discovery_test.go, line 41 at r8 (raw file):
Hmm, I still see this line in the PR, and I don't see the fix from the master here.
I rebased now, sorry about that, I forgot to do it
uast/transformer/semantic.go, line 235 at r10 (raw file):
Please remove parentheses
Done, I keep forgetting this because of C++ habits
uast/transformer/semantic.go, line 266 at r10 (raw file):
Please remove parentheses
Done
uast/transformer/semantic.go, line 289 at r10 (raw file):
This is very slow and creates a lot of garbage! Please revert to `strings.Join`.
I have used a strings.Builder
instead
uast/transformer/semantic.go, line 198 at r8 (raw file):
In that case you should use `DecodeRune*`, as I suggested ;)
I thought about that Denys, but first it adds a lot of extra work and second there is no way to get this more efficiently (converting and only one time to runes, without conversions back and forth between runes and strings). Using DecodeRune*
was an option but I have to convert to []rune
anyway to strip the tabs as was requested, so I prefer to do all the code with []rune
and not a part with string
s and another with []rune
uast/transformer/semantic.go, line 198 at r8 (raw file):
But you can still trim tabs with `DecodeRune`. Am I missing something? This function in general takes a string and splits it into N parts (prefix, tab, lines, suffix). So it's definitely possible to use an original `string` value and slice it. This means that it's possible to implement it without any allocations by using `DecodeRune`. This won't require you to pass the `[]rune` slice around. So do you want to try the `DecodeRune` approach, or should I review the current one?
Review the current one :pray: . This has been opened for a month and it is not such a big deal of fix IMHO
uast/transformer/semantic.go, line 277 at r11 (raw file):
The last bit that may affect performance: please move this line to the `if len(sub) == 0 {` below, so we won't create a large string that will become garbage immediately.
I have changed it there
uast/transformer/semantic.go, line 278 at r12 (raw file):
var tab []rune // fast path, no tabs if len(sub) == 0 {
@dennwc I have also add it here since if we return here it means there is no common tab and we need to set the text and the tab from the comment
The suffixes of the comments in semantic tree were not split correctly, due to slices of strings working with bytes in Go, and not with unicode characters. For example, without this fix, in:
the final
”
was translated intob'\xe2\x80\x9d'
, but the suffix was tell to cut the slice of bytes in the positioni
, wherei
is the\xe2
, not the final of the unicode character (which would bex9d
). And that's because a unicode character can span more than one byte.Closes #433
Signed-off-by: ncordon nacho.cordon.castillo@gmail.com
This change is