Binary - Githubissues

prisae commented 3 years ago

Currently, the library exists only as plain text, with deployed packages for selected languages.

Another idea is to also deploy binaries, e.g., in hdf5.

kerrykey commented 3 years ago

Dealing with absolute and relative file paths in a Fortran library could be tedious, even with binary files. Another option is hard coding the filter data as constants in source (what I did for Dipole1D). We could make a create_Fortran.{py or jl} routine that makes a libdlf Fortran module with all filters hard coded and could make a Makefile that shows how to compile into a library binary that can be linked in the usual way C and Fortran libraries are linked at compile time.

kerrykey commented 3 years ago

I saw the cache statements in your python package so I tried using a memoization package in the Julia code so that it also caches the filters when they are first loaded. The subsequent load commands are indeed quite fast, see below. That said, the initial load is super slow and is taking up a crazy amount of memory given the filter size (maybe this has something to do with the particular Memoization.jl package). But thinking about how a filter will be used in production code, I think they would typically only be loaded once anyway and then reused repeatedly. So not sure how much caching will matter in practice. Also, I re-ran my package without caching and it runs about 1000x faster than the initial load with caching enabled. The load time w/o caching for a 201 point filter seems negligible. Anyway, I'm thinking that the caching probably isn't needed and actually would be an impediment, at least given the timing numbers below. What do you see for the python package? With caching:


julia> @time base, fcos, fsin = LibDLF.Fourier.key_201_2012();
  0.302730 seconds (268.36 k allocations: 15.798 MiB, 99.41% compilation time)

julia> @time base, fcos, fsin = LibDLF.Fourier.key_201_2012();
  0.000012 seconds (9 allocations: 528 bytes)

Without caching:


julia> @time base, fcos, fsin = LibDLF.Fourier.key_201_2012();
  0.000774 seconds (1.27 k allocations: 82.406 KiB)

julia> @time base, fcos, fsin = LibDLF.Fourier.key_201_2012();
  0.000747 seconds (1.27 k allocations: 82.406 KiB)

prisae commented 3 years ago

We could make a create_Fortran.{py or jl} routine that makes a libdlf Fortran module with all filters hard coded and could make a Makefile that shows how to compile into a library binary that can be linked in the usual way C and Fortran libraries are linked at compile time.

Good idea. And this could be created and pushed with a CI to dedicated branches fortran and ccode or similar.

prisae commented 3 years ago

I just double checked, but the time used for caching is negligible. I don't use a library or anything, my caching is quite primitive, I just add it to a dictionary. The biggest time is spent in reading the txt-file. In numbers:

Without caching or first time with caching: ~1.7 ms (0.0017 seconds)
Consecutive calls with caching: ~270 ns (0.00000027 seconds)

With your times above @kerrykey - does @time only run it once? With such small times the uncertainty might be quite big. I use %timeit, which runs it several thousands time to get a good time estimate. To check the times I therefore created three packages; once without caching, once in which it adds it to the cache every time, and then the one where it only caches it the first time (so proper caching). Adding it to the cache comes basically for free.

prisae commented 3 years ago

Also, in your timing, the first time caching example is the only one where it states something about compilation time. So I think these numbers are skewed by something else than the caching effect.

kerrykey commented 3 years ago

Interesting. I wrote my own cache with a Dict() like your code and get the same first call timing and memory allocation mentioned above, so this must be something related to @time on the first call. I also tried @btime (averaging benchmark tool) with file loading every time and get ~ 0.3 ms, so I'm going to ignore the @time results and move on to next to-do list item.

prisae commented 3 years ago

To round up: Does this mean we do not distribute binaries, but rather language-specific packages, right? Then we could close this issue.

kerrykey commented 3 years ago

Yeah, I think we can revisit the idea of binaries later on if load time becomes an issue, but the Julia and Python packages seem fast enough for practical purposes using the text files + caching.

prisae commented 3 years ago

Good. And the FORTRAN and C packages would be hardcoded. So closing (at least for now).

emsig / libdlf

Binary #4