denibertovic / docker-hs

A Haskell library for the Docker Engine API
BSD 3-Clause "New" or "Revised" License
77 stars 48 forks source link

Added log processing #75

Open ilyakooo0 opened 4 years ago

ilyakooo0 commented 4 years ago

closes #74

denibertovic commented 4 years ago

Hey @ilyakooo0 Thanks for the PR. :cake: I've only taken a cursory look as I'm short on time but I'll try and review it properly tomorrow. It seems the CI is erroring for unrelated reasons.

ilyakooo0 commented 4 years ago

Actually, some of the builds errored because I used the new ConduitT, which is not available in the older versions.

UPD: I now fixed it and builds are failing only for unrelated reasons

mbg commented 4 years ago

Hi everyone, I ran into the same problem yesterday, but found that the format is documented in https://docs.docker.com/engine/api/v1.40/#operation/ContainerAttach

Here's some (simplified) code to show how I deal with it at the moment (which seems to work fine):

import Data.ByteString (ByteString)
import qualified Data.ByteString as BS

testSink :: ConduitT ByteString Void IO ()
testSink = do
    -- wait for data to become available
    mbs <- await

    -- the data might be `Nothing` if the conduit has been closed,
    -- so lets decide what to do...
    case mbs of
        Nothing -> pure ()
        -- otherwise, if there is data, send it to the output channel with
        -- appropriate metadata added to it
        Just bs  -> do
            -- the docker daemon attaches 8 byte headers to each line of
            -- output, see:
            -- https://docs.docker.com/engine/api/v1.40/#operation/ContainerAttach
            -- the first byte identifies the stream (1 - stdout, 2 - stderr)
            let streamType = BS.unpack (BS.take 1 bs)
            let message = BS.drop 8 bs

            -- do something with `message` according to `streamType`

            -- resume waiting for input
            testSink

-- put this somewhere else in your program to use it
getContainerLogsStream logOpts containerID $ CB.lines .| testSink

It is worth noting that it may not be sensible to deal with this directly in the implementation of getContainerLogsStream since the 8 byte headers are only present if tty: false for the container. If a user sets tty: true in the CreateOpts then the 8 byte header will not be present.

denibertovic commented 4 years ago

Hey @mbg thanks for the clarification and the example. It was always the intention of the library to leave the stream processing outside of the library ie. let the user define the sink that gets passed to the getContainerLogsSream function. So I'm not sure if it's wise to include this entire machinery in the library itself.

That said, there is too much abstration leaking outside of the library and the user has to know docker api internals to construct a proper sink....(ie. the message structure and what the first bytes mean and so on). So I think we need to address this somehow.

denibertovic commented 2 years ago

I'd love to get this one merged but I haven't had time to look into it in more detail. I still think we shouldn't handle the stream necessarily in the library but there's also too much abstraction leaking if we let the users implement it themselves.

Would an example stream that just prints stuff to stdout and a pointer to it in the docs be the best course of action? @ilyakooo0 @mbg what do you think?

ilyakooo0 commented 2 years ago

IMO the main problem is that the raw stream produces invalid characters (as some of the data is not text at all). So, the stream is mostly useless unless you do the exact processing that is done in this PR even if you want to get the binary output of the executable.

I can't imagine a reason why someone would use this library and want to get the raw unprocessed stream.

mbg commented 2 years ago

I agree with @ilyakooo0. I think that there should be an implementation of this in the library, at the very least as an opt-in (i.e. even if it was just something along the lines of the testSink from my comment that got exported). The two questions for me are:

Neither of these are particularly big issues, they are just design decisions that need to be made.

mbg commented 2 years ago

@denibertovic have you had any more thoughts on this / have you made a decision?

denibertovic commented 2 years ago

@mbg I agree with your previous comment. This should be implemented as a separate function and not directly in getContainerLogsStream so that it can handle the tty: true case. We export that function and provide docs/example on how to use it with getContainerLogStream. I haven't thought about your second point but introducing a new type for that seems fine to me.

I don't have time to work on this but I'd be happy to merge if the above changes were made.