daurnimator / mmdblua

Maxmind database parser for lua
MIT License
29 stars 9 forks source link

Provide an option not to have the whole database in memory #4

Open pisto opened 9 years ago

pisto commented 9 years ago

Suggestions for the right way to do this? I think that file:seek() would work.

daurnimator commented 9 years ago

You need to have much of it in memory to find the data section seperator. See https://github.com/daurnimator/mmdblua/blob/master/mmdb.lua#L20 Searching backwards is 'hard'.

After that, ipv6_find_ipv4_start will need to traverse a reasonable amount of the file...

I'm not sure if it's worth putting the work in for this?

pisto commented 9 years ago

no it's not high priority. But would be nice, as right now basically my server thing takes 3MB and the database 30MB. With the other geoip bindings there was an option to map the file in memory, so multiple instances of the same program would allocate the memory only once. I'll look around if there's a decent lua library that can do this with arbitrary files and expose them as strings.

daurnimator commented 9 years ago

With the other geoip bindings there was an option to map the file in memory

Yeah; they'll probably be using mmap(). You could do this in luajit via the ffi; but I'd rather not bring that dependency in.

I'll look around if there's a decent lua library that can do this with arbitrary files and expose them as strings.

That won't be possible; lua strings are interned.

pisto commented 9 years ago

I know, I mean expose the mmaped region in some way semantically equivalent to a string.

daurnimator commented 9 years ago

Looking through again there isn't that many string methods in use:

You should be able to replace most of these with a :seek and :read, I'd be willing to accept a pull request that adds this.

daurnimator commented 7 years ago

To quote myself from https://github.com/daurnimator/mmdblua/pull/5#issuecomment-262545798

Sorry, but after reflecting on this a while, I realised that seeking a file here is not the correct answer: it will result in un-necessary slowness due to seek syscall overhead.

What I suggest instead is an ffi-only optimisation that uses an mmap call (perhaps via ljsyscall).