Closed nh2 closed 1 year ago
I'll trust you on that, I don't have the tooling to check it
@Twinside Here's an easy way to measure, e.g. in GHCi:
import qualified Data.ByteString.Lazy as L
import Data.Binary
import Data.Binary.Get
import qualified Codec.Picture.Jpg.Internal.Types as JPG
:set -XTypeApplications
:set +s
L.readFile "large110MB.jpg" >>= \bs -> return $ case runGetOrFail (get @JPG.JpgImage) bs of { Left (_rest, offset, err) -> Left ("ERROR", offset, err) ; Right (_rest, offset, jpgImage) -> Right (offset, jpgImage `deepseq` ()) }
The previous implementation prints (by means of :set +s
which enables timing):
(806.22 secs, 121,282,712 bytes)
Doing the same with the new implementation (parseECS
) from this PR prints:
(0.40 secs, 122,191,224 bytes)
So for this case, it is 2000x faster.
For parseECS_simple
, I get:
(0.88 secs, 4,729,102,080 bytes)
This is still quite fast, but 2.5x slower than parseECS
and doing 20x more allocation.
For the claim
only ~20% slower than a non-lazy ByteString based loop
I simply made a copy of the existing extractScanContent
and switched the types from .Lazy
to normal ByteString
, like this:
extractScanContentStrict :: L.ByteString -> (L.ByteString, L.ByteString)
extractScanContentStrict str_lazy = aux 0
where !maxi = fromIntegral $ B.length str - 1
!str = L.toStrict str_lazy
aux !n | n >= maxi = (L.fromStrict str, L.empty)
| v == 0xFF && vNext /= 0 && not isReset = (let (a, b) = B.splitAt n str in (L.fromStrict a, L.fromStrict b))
| otherwise = aux (n + 1)
where v = {- (if n `mod` 1000000 == 0 then trace (" n = " ++ show n) else id) -} str `B.index` n
vNext = str `B.index` (n + 1)
isReset = 0xD0 <= vNext && vNext <= 0xD7
For that I obtained:
(0.35 secs, 235,527,808 bytes)
In the commit I made a claim that getRemainingLazyByteString
does not work well with binary
's incremental input interface. That can be checked with these commands, using the binary-conduit
as an example:
import Conduit
import Data.Conduit.Serialization.Binary -- from `binary-conduit`
-- Only reads a small part of the file:
runConduitRes $ sourceFile "bigfile.bin" .| sinkGet ((\a rest -> a) <$> getWord8 <*> getWord8)
-- This reads the entire file via the conduit (which is not lazy IO):
runConduitRes $ sourceFile "bigfile.bin" .| sinkGet ((\a rest -> a) <$> getWord8 <*> getRemainingLazyByteString)
@Twinside
There are other usages of L.index
and Lb.index
in JuicyPixels that might also be quadratic and that I didn't fix.
For example:
git grep '\.index' | grep -v '\bB\.index\b'
src/Codec/Picture/HDR.hs: where at n = L.index str . fromIntegral $ idx + n
src/Codec/Picture/HDR.hs: | otherwise = pure $ L.index inputData (fromIntegral idx)
src/Codec/Picture/Png.hs: PixelRGBA8 r g b $ Lb.index transpBuffer (fromIntegral ix)
@Twinside It would be nice if you could tell me if the semi-lazy behaviour of JPEG parsing was accidental or intentional.
If it was accidental, we could consider it a bug, and perhaps switch the implementation of the JPEG parser away from the quirky to the strict one (which so far I haven't done).
This PR seems to introduce a bug.
This code:
-- Load the image
dynamicImage <- decodeImage contents
pure $ imageToJpg 100 dynamicImage
Turns this image:
Into this image
I will investigate.
This should fix it: PR #216
merged! I'll push an update on hackage "soon©"
Makes JPEG parsing 1000x faster for large pictures (I tried with a 110 MB one).
binary
package'sGet
parser API, which caused theGet
based functions to not correctly maintain the parser offset (how many bytes were consumed).Please see the individual commit messages, especially the one of the commit titled
Jpg: Fix quadratic JPEG parsing
, for full details.Other preparatory refactoring commits are also included.
Copying the main commit's initial message here for easy reading:
Data.ByteString.Lazy
'sindex
is O(chunks), not O(1). The default chunk size is 32 KB. Thus, calling thatL.index
in a+1
loop, asextractScanContent
did, caused accidentally quadratic runtime.trace
ofn
into the loop, and observing how printouts get slower over time.getRemainingLazyBytes
frombinary
'sData.Binary.Get
module resulted in weird semi-lazy behaviour that was both incorrect (not setting the parser offset correctly, generating misleading error offsets and messages) and slow (encouraging this loop over lazy ByteStrings instead of just using theGet
parser as intended).parseFrames
function in its haddocks, and subsequently renamed it toparseFramesSemiLazy
.extractScanContent
by a normalGet
based parser.parseECS
andparseECS_simple
. (ECS is the proper name for the "scan content" bytes according to the spec.) The simple one is the straightforward translation, the other one a higher-performance implementation that is only ~20% slower than a non-lazy ByteString based loop. Both variants are faster than the original implementation because they are linear, not accidentally quadratic.parseFrames
is replaced by a strict implementation that uses the new, correctparseECS
. Compared to the previousparseFrames
(and currentparseFramesSemiLazy
) it also fixes its remaining issues I found:parseFrames
was previously unexported from this.Internal
module, so this rename has no backwards incompatibility implications.instance Binary JpgImage
continues to useparseFramesSemiLazy
, for which I've taken care to preserve its existing quirky laziness semantics. I kept it this way for now because due to lack of comments it is unclear to me whether this quirky lazy behaviour was intended, or a complete accident.Beyond that:
parseFrames
, for example one that that searches for the first scan header to determine the image dimensions without parsing the whole JPG.binary-conduit
or other uses ofbinary
's incremental parser input interface.parseECS
withparseECS_simple
is added.