MarcWeber / hasktags

Produces ctags "tags" and etags "TAGS" files for Haskell programs
Other
131 stars 33 forks source link

Replace nubBy by (map head . groupBy); strip all inline comments #45

Closed lyokha closed 6 years ago

lyokha commented 6 years ago
  1. (map head . groupBy) has linear complexity whereas nubBy is quadratic. Though nubBy is more generic, the linear algorithm should work here as well because in Haskell all related declarations must be adjacent.
  2. All inline comments starting with "--" get now removed when collecting tokens. This lets hasktag avoid collecting wrong declarations in some cases, and also makes it possible to simplify function concatTokens.

Minor improvement after hlint

lyokha commented 6 years ago

In principle, the map head . groupBy approach can be broken by changing findstuff traversal order. However this must work fine with the current implementation of findstuff, and even better than nubBy, for example imagine several blocks of code with functions of the same names separated by conditional GCC macros: nubBy will catch only one of them whereas map head . groupBy should catch them all. Also, the new approach should mitigate an issue when functions with the same name under different instances declarations get lost due to default ignore-close-implementations.

jhenahan commented 6 years ago

I fully agree that nubBy should be replaced with map head . groupBy. They're nearly always functionally equivalent, and the latter is almost always faster. I seem to remember a comment by someone or other about this a few months ago.

I'll merge this so I can play with it, but I'll wait to cut a new release until I've had some time to dissect findstuff, among other things. I'd like the next release to be significantly cleaned up, modularized, and modernized so contributing will be easier.

Once again, I appreciate all your work! Thanks!

lyokha commented 6 years ago

Btw, just found that this nubBy to groupBy transition has a (not critical) downside. I was testing code from JuicyPixels and with the following:

radiance32bitRleRGBEFormat, radiance32bitRleXYZEFromat :: B.ByteString
radiance32bitRleRGBEFormat = BC.pack "32-bit_rle_rgbe"
radiance32bitRleXYZEFromat = BC.pack "32-bit_rle_xyze"

the both a function signature and a function implementation of the second function get tagged, even though the implementation should be skipped because it's only 2 lines apart. This happens because declaration of a function signature after comma breaks normal declarations flow. I am pretty sure that it won't happen with nubBy. To my taste, this issue is still tolerable to not switch the implementation back.

jhenahan commented 6 years ago

Agreed. I released this PR along with rolling back a regression I introduced, so hopefully no one minds the occasional comma issue.