discourse / mini_mime

minimal mime type library
MIT License
95 stars 34 forks source link

C extension #5

Closed stereobooster closed 7 years ago

stereobooster commented 7 years ago

before

Memory stats for requiring mime/types/columnar
Total allocated: 10706175 bytes (114132 objects)
Total retained:  3472241 bytes (31905 objects)

Memory stats for requiring mini_mime
no micro_mime
Total allocated: 29572343 bytes (344531 objects)
Total retained:  3943884 bytes (60957 objects)
Warming up --------------------------------------
cached content_type lookup MiniMime
                        33.573k i/100ms
content_type lookup Mime::Types
                        22.444k i/100ms
Calculating -------------------------------------
cached content_type lookup MiniMime
                        411.406k (± 5.1%) i/s -      2.082M in   5.072606s
content_type lookup Mime::Types
                        260.357k (± 4.6%) i/s -      1.302M in   5.010742s
Warming up --------------------------------------
uncached content_type lookup MiniMime
                         1.818k i/100ms
content_type lookup Mime::Types
                        23.239k i/100ms
Calculating -------------------------------------
uncached content_type lookup MiniMime
                         17.224k (±13.3%) i/s -     85.446k in   5.079496s
content_type lookup Mime::Types
                        252.985k (± 5.1%) i/s -      1.278M in   5.067150s

after

Memory stats for requiring mime/types/columnar
Total allocated: 9686150 bytes (104029 objects)
Total retained:  3341587 bytes (30943 objects)

Memory stats for requiring mini_mime
Total allocated: 107068 bytes (648 objects)
Total retained:  35458 bytes (84 objects)
Warming up --------------------------------------
cached content_type lookup MiniMime
                        36.319k i/100ms
content_type lookup Mime::Types
                        20.906k i/100ms
Calculating -------------------------------------
cached content_type lookup MiniMime
                        449.882k (± 3.2%) i/s -      2.252M in   5.010620s
content_type lookup Mime::Types
                        237.783k (± 4.5%) i/s -      1.192M in   5.022733s
Warming up --------------------------------------
uncached content_type lookup MiniMime
                        37.974k i/100ms
content_type lookup Mime::Types
                        21.801k i/100ms
Calculating -------------------------------------
uncached content_type lookup MiniMime
                        447.517k (± 4.7%) i/s -      2.240M in   5.017812s
content_type lookup Mime::Types
                        239.405k (± 4.2%) i/s -      1.199M in   5.017784s

cc @ioquatix

ioquatix commented 7 years ago

Awesome, good work.

You need to include the MIT license and the copyright, ideally in each file as it was originally.

While I think this is interesting, I still assert that this is the wrong design - but it looks like we are getting somewhere.

It should be possible for someone to use the existing MiniMime, this C extension version, or MIME::Types interchangeably.

ioquatix commented 7 years ago

Also what was the motivation to rewrite the templates using ERB rather than Trenni? Trenni is 20x faster than ERB which makes a difference when dealing with large data sets, and it has a better API too :D

ioquatix commented 7 years ago

Finally, it would be awesome to see a comparison against the original mini_mime code path.

stereobooster commented 7 years ago

@SamSaffron Changed everything. Moved extension to separate gem. Used it as optional dependency. Can move ownership to discourse org.

@ioquatix added benchmarks

stereobooster commented 7 years ago

The only thing which seems to be a bit strange is it is always generates new object even for subsequent calls.

irb(main):004:0> MicroMime.lookup_by_filename("a.txt")
=> #<MicroMime::Type:0x0000010a7c8098 @content_type="text/plain", @encoding="quoted-printable", @extension="txt">
irb(main):005:0> MicroMime.lookup_by_filename("a.txt")
=> #<MicroMime::Type:0x0000010a7bb000 @content_type="text/plain", @encoding="quoted-printable", @extension="txt">
irb(main):006:0> MicroMime.lookup_by_filename("a.txt")
=> #<MicroMime::Type:0x0000010a7a9670 @content_type="text/plain", @encoding="quoted-printable", @extension="txt">

Which is overhead. It can use some kind of cache, to store 100 latest objects.

ioquatix commented 7 years ago

Using a cache is not thread safe.

SamSaffron commented 7 years ago

@jeremy what do you think, it does not add any dependencies and is clearly an opt in thing, only cost is the LoadError exception which I guess we can swallow.

@ioquatix I would like to see a test for "lookup" parity between the 2 db providers, basically make sure that current stable of micro mime is always exactly the same, then travis at least can ensure you have micromime updated to latest automatically (or that we do)

stereobooster commented 7 years ago

I would like to see a test for "lookup" parity between the 2 db providers, basically make sure that current stable of micro mime is always exactly the same

I copied your Rake task which you use to generate db files, but it generates C headers (gperf files to be precise). So databases are the same. In initial @ioquatix implementation they were a bit different.

SamSaffron commented 7 years ago

@stereobooster sure I believe you :) but would be nice to have a test for it as well that the mini_mime test suite runs

stereobooster commented 7 years ago

take a look at https://github.com/stereobooster/micro_mime it is mini_mime fork. It has mini_mime tests

SamSaffron commented 7 years ago

I follow, but our Gemfile here now depends on micro_mime post this change. If I go ahead and update the DB here I would like to see a test fail showing me that micro_mime is out of date or vica-versa

An alternative for the LoadError may be to expose an interface to inject the DB...

Then you set it up so when micro_mime is required it injects the new DB in.

I prefer this to the explicit load error cause it is cleaner and more flexible.

ioquatix commented 7 years ago

I think injecting global state, is again, the wrong design choice., but instead, you should write code like

def default_database
    begin
        require 'micro_mime'
        return MicroMime::Database.new
    else
        return MiniMime::Database.new
    end
end
ioquatix commented 7 years ago

Uh, it's 4am, but you get the idea :)

SamSaffron commented 7 years ago

think most users would prefer to have an extension bundled with the library itself rather than needing to know to add an additional optional dependency.

I am actually mixed on this, vast majority of users are not doing almost any lookups of mime types, when they happen they happen while processing mail and are very likely to end up in the cache. Taking in a native dependency here with all it entails (potential segfaults etc etc) seems overkill.

stereobooster commented 7 years ago

@SamSaffron

I am actually mixed on this, vast majority of users are not doing almost any lookups of mime types, when they happen they happen while processing mail and are very likely to end up in the cache.

Actual win of this extension is (I do not care that much about speed, because those lookups are indeed very rare):

Memory stats for requiring mini_mime
Total allocated: 107068 bytes (648 objects) # was 29572343 bytes (344531 objects)
Total retained:  35458 bytes (84 objects)   # was 3943884 bytes (60957 objects)

But this will work only if you do:

gem "micro_mime"
gem "mini_mime"

Irony: add more gems, to use less memory

SamSaffron commented 7 years ago

I am getting different numbers:

require 'memory_profiler'
MemoryProfiler.report do
  require 'mini_mime'
end.pretty_print

https://gist.github.com/SamSaffron/726a158f16890ba0903a87ce038a8cce

So when you look at allocations, as expected RubyGems is a pariah and allocating 218K, mini mime gets away just allocating 6K on boot

mini_mime allocates 33 objects, another 25 or so may be allocated indirectly, but rubygems and its crazy allocates 2142 objects.

When you look at retained memory, mini_mime retains 5.8K which is only 200 bytes more than "memory retention overhead" by ruby gems.

I think you should look at rubygems if you want to make progress here

stereobooster commented 7 years ago

@SamSaffron you right - my results. But it means that bench/bench.rb is misleading - that is where I took my first results.

ioquatix commented 7 years ago

Rather than making something that can fall out of sync, why not look at the design of mime-types and mime-types-data.

SamSaffron commented 7 years ago

closing this for now. not sure there is anything we need to do.

stereobooster commented 7 years ago

@SamSaffron I was wondering how to shave of memory in mail gem e.g. how to instruct mail gem not to load mini_mime and load micro_mime instead.