Use Lazy ByteStrings instead of Strict ByteString

RubenAstudillo commented 7 years ago

Hello, thank for you work

I recently did a $ hasktags -e ~/ghc and it took around of 3 minutes. Profiling and looking at the cost-centers it resulted in the following graph ghc-old (notice the time it took to completion). More profiling revealed that the findThings function retained much data, as a short function I though readFile was the culprit. Changing the implementation for the one of Lazy ByteStrings as

diff --git a/src/Hasktags.hs b/src/Hasktags.hs
index 001feec..804cda1 100644
--- a/src/Hasktags.hs
+++ b/src/Hasktags.hs
@@ -25,9 +25,9 @@ import Tags
       mywords,
       writectagsfile,
       writeetagsfile )
-import qualified Data.ByteString.Char8 as BS
+import qualified Data.ByteString.Lazy.Char8 as BS
     ( ByteString, unpack, readFile )
-import qualified Data.ByteString.UTF8 as BS8 ( fromString )
+import qualified Data.ByteString.Lazy.UTF8 as BS8 ( fromString )
 import Data.Char ( isSpace )
 import Data.List ( tails, nubBy, isSuffixOf, isPrefixOf )
 import Data.Maybe ( maybeToList )

Resulted in the following graph ghc-new (Notice the running time), we reduced maximum allocation almost 4 times.

As this programs isn't a long running one, the pitfalls of lazy IO don't exactly apply, yet using a streaming framework would also solve the problem at the cost of more imports. It seems a good tradeoff. What do you think?

jhenahan commented 7 years ago

Thanks for the profiling and the suggestions. I've got some refactoring and more PR review planned for this month, and I'll keep this in mind!

jhenahan commented 6 years ago

Implemented in a0b21c5.

MarcWeber / hasktags

Use Lazy ByteStrings instead of Strict ByteString #33