Gabriella439 / Haskell-Pipes-Safe-Library

Safety for the pipes ecosystem
BSD 3-Clause "New" or "Revised" License
26 stars 21 forks source link

onException seems to use plenty of memory #46

Open picca opened 3 years ago

picca commented 3 years ago

Hello, I am writing a program which read its data from an hdf5 file. for each images of this file it compute soimething and add it into an IORef wchi is is in fact a C objects managed via FFI

the interesting part of the code is here

processHkl :: FramesHklP a => InputHkl a b -> IO ()
processHkl input@(InputHkl det _ h5d o res cen d r config' mask') = do
  pixels <- getPixelsCoordinates det cen d r

  jobs <- mkJobsHkl input
  r'<- mapConcurrently (\job -> withCubeAccumulator $ \s -> do
               runSafeT $ runEffect $
                 each job
                 >-> tee Pipes.Prelude.print
                 >-> framesHklP h5d det
                 >-> Pipes.Prelude.mapM (liftIO . spaceHkl config' det pixels res mask')
                 >-> mkCube'P det s
           ) jobs
  Prelude.print r'
  return ()
  -- saveCube o r'

Pipe Safe is used to managed the resources, (open and close files), In my case I open 10 times the same file (for test) My program use a lot's of memory and it seems that most of this memory is in the onException method.

here part of the profilling informations. all the hkl_xxx are FFI functions (time consuming)

        Mon Oct 19 17:23 2020 Time and Allocation Profiling Report  (Final)

           binoculars-ng +RTS -N -s -pa -RTS process data/test/config_sixs_local.ini

        total time  =      289.35 secs   (730656 ticks @ 1000 us, 24 processors)
        total alloc = 3,469,792,304 bytes  (excludes profiling overheads)

COST CENTRE                                MODULE                                               SRC                                                                     %time %alloc  ticks     bytes

IDLE                                       IDLE                                                 <built-in>                                                               41.4    0.0  302154         0
hkl_binoculars_space_hkl                   Hkl.Binoculars.Projections                           src/Hkl/Binoculars/Projections.hs:188:53-146                             23.5    0.1  171493   2085248
hkl_binoculars_cube_new_merge'             Hkl.C.Binoculars                                     src/Hkl/C/Binoculars.hsc:85:61-96                                        13.8    0.0  100607         0
SYSTEM                                     SYSTEM                                               <built-in>                                                               10.4    0.0  76271   1472776
hkl_binoculars_cube_new'                   Hkl.Binoculars.Common                                src/Hkl/Binoculars/Common.hs:136:53-119                                   4.8    0.0  34870    794408
GC                                         GC                                                   <built-in>                                                                3.7    0.0  27336      1112
onException                                Pipes.Safe                                           src/Pipes/Safe.hs:(400,1)-(404,12)                                        0.9   52.9   6565 1834549720
h5e_try                                    Bindings.HDF5.Raw.H5E                                src/Bindings/HDF5/Raw/H5E.hsc:(387,1)-(391,17)                            0.7    9.3   5042 322654064
control                                    Control.Monad.Trans.Control                          Control/Monad/Trans/Control.hs:751:1-39                                   0.4   10.2   2569 352506304
for                                        Pipes                                                src/Pipes.hs:176:1-11                                                     0.1    3.4    911 118529496

So I would like your help in order to investigate this issue.

This program should use a constant amount of memory independently of the number of images red. But this is not the case, the more image, the bigger the onException is.

thanks for considering

Gabriella439 commented 3 years ago

@picca: I'd need to know more details about how onException is used. Do you have a link to a sample repository or minimal code example

However, note that a large number of allocations does not necessarily mean having a large heap footprint. The only way to know that for sure is to collect a heap profile by following the instructions here:

https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/profiling.html#profiling-memory-usage

picca commented 3 years ago

Hello,

here the repository of the code https://repo.or.cz/hkl.git/tree/6c35373943fb58c92e28f60a53b819cca900304f:/contrib/haskell

the function I am speaking about is located in this file

https://repo.or.cz/hkl.git/blob/6c35373943fb58c92e28f60a53b819cca900304f:/contrib/haskell/src/Hkl/Binoculars/Projections.hs

This required also a non hackages distributed hdf5 binding located here.

https://github.com/picca/hs-hdf5

in here you can find the h5e_try function, which could maybe interact with the onException

If you need more informations, I can provide them.

Cheers

Gabriella439 commented 3 years ago

@picca: I probably won't have time to dig further into this, but my advice is to collect the heap profile by following the instructions I linked to earlier. That will give a clearer picture of whether or not the program is running in constant space or not, and if it is not it will also highlight which parts of the code are holding onto memory

picca commented 3 years ago

Hello, I discovered an error in my C code which increased a lot the memory usage.

I would like your advice about the exception handling.

framesHklP h5d det

contain in fact a bunch of

bracket ...
   bracket ...
      bracket ....

The exception I want to deal with is during the acquisition of the ressources. If something went wrong, I would like to skip the current ressource and process the next resource generated via each job.

I am not sure how to do this.

Gabriella439 commented 3 years ago

@picca: I believe one way to handle that is to wrap each resource acquisition with a catch statement that will ignore the exception if you don't want the exception to be fatal.

In other words, if you have something like:

example = do
    bracket acquire0 release0 $ \resource0 -> …
    bracket acquire1 release1 $ \resource1 -> …
    …

Then you can wrap each resource block in something like this:

ignore :: IO () -> IO ()
ignore io = io `catch` handler
  where
    handler :: SomeException -> IO ()
    handler _ = return ()

example = do
    ignore $ bracket acquire0 release0 $ \resource0 -> …
    ignore $ bracket acquire1 release1 $ \resource1 -> …
    …
picca commented 3 years ago

Thanks a lot for your answer.

Gabriella439 commented 3 years ago

@picca: You're welcome! 🙂

picca commented 3 years ago

Hello, I have one more question. suppose that I read a value from my file and depending of this value to not yield a value. Should I encode this logic with if then else or is there a special value which can be send downstream to say, skip this ?

Gabriella439 commented 3 years ago

@picca: You can do something like:

x <- readSomeValue

if someFunctionOf x
    then yield x
    else return ()

This works because return () does nothing (i.e. emits no value)