discourse / mini_mime

minimal mime type library
MIT License
95 stars 34 forks source link

mini_mime vs marcel #34

Open pjmartorell opened 3 years ago

pjmartorell commented 3 years ago

Hi, I don't know if this is the right place to post it, but I'm trying to compare mini_mime vs marcel regarding looking up by extension, because I think both gems cover the same space. I was trying to compare the number of extensions registered, the performance and memory consumption of every gem.

mini_mime marcel
#extensions ​ File.open(MiniMime::Configuration.ext_db_path).readlines.count => 1196 Marcel::EXTENSIONS.count => 1243

Regarding memory handling, mini_mime has a hash cache of 200 rows and misses are binary-searched from a file while marcel loads all records in a hash in memory. Is not reading from a file less performant than loading everything in memory? Loading everything in memory consumes more memory obviously, but the gain in performance outweighs the memory consumption, in my opinion.

Also I noticed that both DBs in mini_mime contain similar data but is there any reason why are not both DBs merged removing duplicates? I saw that when merging both files the number of rows/extensions is 1210, but I'm not completely sure if it's due to an error removing duplicates:

irb(main)> File.readlines(MiniMime::Configuration.ext_db_path).each do |line|
irb(main)*     s << line.strip
irb(main)> end
irb(main)> File.readlines(MiniMime::Configuration.content_type_db_path).each do |line|
irb(main)*     s << line.strip
irb(main)> end
irb(main)> s.length
=> 1210
ahorek commented 3 years ago

unlike mini_mime, which is just a simple table of extension -> content type, marcel and mime_magic also allow lookup by file signature https://en.wikipedia.org/wiki/List_of_file_signatures (magic numbers). This is considered as a security feature, that's why Rails use it.

https://github.com/mime-types/ruby-mime-types - has a much more complex API, mini_mime uses the same DB, but it's simplified for performance reasons (1 extension = 1 mime type).

btw Rack also has its own DB https://github.com/rack/rack/blob/master/lib/rack/mime.rb#L51

sometimes it's hard to persuade some maintainers to do a change https://github.com/rest-client/rest-client/pull/557 and it would be even harder to do a much more breaking change in marcel just to save a few kb of memory. Yes, it would be nice and I'm 100% pro, but I also don't think it's realistic :)

pjmartorell commented 3 years ago

@ahorek thanks for the reference to https://github.com/rest-client/rest-client/pull/557, is exactly what I wanted to know/understand.

SamSaffron commented 3 years ago

I started discussing @georgeclaghorn

My long term thinking here.

halostatue commented 2 years ago

@SamSaffron I am all in favour of adding more data to mime-types-data.