Tieske / uuid

A pure Lua uuid generator (modified from a Rackspace module)
http://tieske.github.io/uuid/
137 stars 50 forks source link

UUIDs are predictable #9

Open daurnimator opened 7 years ago

daurnimator commented 7 years ago

You only seed once, and with a poor quality source (time).

It's entirely possible to predict all your UUIDs, which makes them not universally unique.

Tieske commented 7 years ago

see https://github.com/Tieske/uuid#notes

daurnimator commented 7 years ago

The issue is not only the seeding (and FWIW, any sort of time is a poor choice of seed).

C's rand() (which math.random uses) is predictable after observing only a few values: for UUIDs you should be using an unpredictable source of entropy.

This could be from e.g. luaossl's rand module, or by reading from /dev/urandom, or some binding to getrandom.

Tieske commented 7 years ago

That is a valid point, unfortunately, it's a pure Lua library so limited in what it can do. As such predictability isn't that bad, as long as the uuids are unique.

But obviously would be nice if Lua had a better random generator. Would be nice if we could use something else in a nice small lib.

daurnimator commented 7 years ago

That is a valid point, unfortunately, it's a pure Lua library so limited in what it can do

I see it as totally irresponsible to have UUID library with this limitation: it defeats the point of using UUIDs for most applications.

predictability isn't that bad, as long as the uuids are unique.

If a UUID is predictable it's not unique.

But obviously would be nice if Lua had a better random generator.

On unixes you can io.open("/dev/urandom")

Would be nice if we could use something else in a nice small lib.

If you don't want to depend on a C library, then you should make the random function pluggable. That way someone can go use luaossl, or some luajit ffi construction, or whatever else if they want.

I would go as far as to assert() that the plugged in function is not math.random, as helping people avoid foot-shots is good (they could always get around it with a closure if they really want to)

Tieske commented 7 years ago

actually I was thinking about a small library that would fill in some blanks. lfs and socket libraries do most of the platform specific stuff that Lua cannot handle itself.

So something like luasystem, but then add reading/writing environment variables, a decent random generator, platform constants, etc. but all platform independent. Nothing too fancy, but just basics to fill the gaps, were plain Lua cannot go due to c limitations.

daurnimator commented 7 years ago

actually I was thinking about a small library that would fill in some blanks. lfs and socket libraries do most of the platform specific stuff that Lua cannot handle itself.

So something like luasystem, but then add reading/writing environment variables, a decent random generator, platform constants, etc. but all platform independent. Nothing too fancy, but just basics to fill the gaps, were plain Lua cannot go due to c limitations.

I've seen too many kitchen sink libraries come and go to believe that's useful. Off the top of my head:

I much prefer libraries that do one thing and do them well. that way I can just pick the bits I want. but also it means the less useful or unmaintained bits can fall by the wayside and the libraries don't need to lug along backwards compatibility

Tieske commented 7 years ago

I agree, hence I don't wan't it to be a kitchen sink thing. Only the bare minimum c-level, cross-platform (<- most important element here) stuff. Sockets and filesystems have a specialised lib as they are big enough areas to deserve it. I just don't like a library only for getting time, another for a sleep function, and yet another with only a randomiser or environment variable support. Those functionalities are so small, they'd be better of in a single library.

alexdowad commented 2 years ago

Hi, @Tieske!

The difficulties in getting a good random seed in pure Lua are understood. However, I would like to say that for Nginx Lua module, the way that the sources of entropy (ngx.time() and ngx.worker.pid()) are mixed together (by adding them) is inadequate, and much better can be done even in pure Lua.

I have found that if I restart my instance of Nginx a couple times within a relatively short period (perhaps a minute or so), I sometimes start noticing some duplicate UUIDs. I believe that this is because when the Nginx workers restart, they are assigned PIDs just slightly higher than the previous time, and time + PID values may overlap with those used the previous time Nginx was restarted.

I feel that hashing the time, PID, and any other source of entropy which you can access in pure Lua together, then using that hash as a seed, would be much better than just adding them. Then if either time or PID is different, the chances would be very low of getting a duplicate seed.

Tieske commented 2 years ago

@alexdowad for OpenResty, please use https://github.com/thibaultcha/lua-resty-jit-uuid