PaddeK / node-maxmind-db

This is the pure Node API for reading MaxMind DB files. MaxMind DB is a binary file format that stores data indexed by IP address subnets (IPv4 or IPv6).
GNU Lesser General Public License v2.1
88 stars 25 forks source link

getGeoData VS. getGeoDataSync #10

Closed connesc closed 9 years ago

connesc commented 9 years ago

I quickly read the code behind getGeoDataSync and I did not found any real synchronous call (unlike openSync, which calls fs.readFileSync). So, does getGeoDataSync blocks the process in some way? If no, is there any reason to distinguish getGeoData and getGeoDataSync?

the-eater commented 9 years ago

The difference between getGeoDataSync and getGeoData is more that getGeoData doesn't allow to look up more then 20 pointers per tick, it's a bit fake async.

You can see it here

connesc commented 9 years ago

Ok, thanks for the answer, that's what I guessed.

I think these functions are not properly named because the Sync suffix usually denotes a function that blocks the process (which should be avoided in most cases, except at startup). Here, since getGeoDataSync is not blocking, and getGeoData is a less performant variation, I think that the naming should instead induce a preference for the first one. For instance:

That said, I don't think it's a priority because it would change the API. But It would be a good idea for a next release.

the-eater commented 9 years ago

Well getGeoDataSync does block, f.x. run a http client that runs like 20+ getGeoDataSync's every request and try to refresh you will probably feel lag.

connesc commented 9 years ago

By blocking I mean that the Node.js process becomes idle while waiting for an external event (e.g. I/O).

I agree that getGeoDataSync is expensive and occupies the process for a long time, but this is not wasted time. If you wrap getGeoDataSync in a nextTick call (what getGeoData does), you are just deferring calculus, but the same time will be spent at the end of the day.

In your example, if you wait for the completion of all your asynchronous calls, you will feel the same lag effect (maybe more).

However, an asynchronous version can still be useful, especially when you don't need to wait for geolocation data in order to generate the HTTP response. But I tend to see getGeoData as an helper since one could write an equivalent version like this:

function getGeoDataAsync(ip, callback) {
    process.nextTick(function onNextTick() {
        var result = null, error = null;
        try {
            result = db.getGeoDataSync(ip);
        } catch (err) {
            error = err;
        }
        callback(error, result);
    });
}
the-eater commented 9 years ago

What you're saying is not totally true, because I don't defer calculus, I spread it over time to hold respect to other time consuming tasks. so your example wouldn't be working as it still puts all calculus in one tick. while my version spreads it over several ticks.

connesc commented 9 years ago

Ok, I actually missed the spread effect. It sounds like a good idea. However, I don't think it is what is currently implemented: nextTick actually defers tasks to the end of the current iteration (just before returning to the event loop). This means that the whole calculus will in fact be processed in one turn of the event loop. setImmediate would be more appropriated here.

the-eater commented 9 years ago

Yes, but this is only after node 0.10 so yeah should fix that but was scared of breaking it for some people, feel free to open a pull request

connesc commented 9 years ago

Yes, you're right, it's a bit dangerous. Thanks for all your explanations, I understand the distinction between getGeoData and getGeoDataSync now. I'm still convinced that the Sync suffix may wrongly scare people, but it's not fundamentally an issue.

the-eater commented 9 years ago

No problem! And yes is kinda does but its not too big of an issue

If you have more questions feel free to mail me!