endyjasmi / cuid

Collision resistant id (hashes) for php
MIT License
72 stars 9 forks source link

State file + locking required? #2

Open ericelliott opened 10 years ago

ericelliott commented 10 years ago

This works because node.js only run in a single process to serve multiple client. PHP on the other hand create a new process for each request. This means that PHP cannot store state across multiple request. In order to deal with this, a temporary file is used to store state across processes. The file will be locked for each read.

If PHP really creates a new process, does that new process have a different PID? If that's the case, you don't need to store the count state, because the PID will differentiate between the processes, and IDs should not collide...

.pid is the second state to be stored in the temporary storage. As discussed above, PHP does not have a consistent process id because it will create a new process for each request. In order to deal with this, the first time the library is used, the process id are stored and will be used for subsequent cuid generation until the state file is deleted.

Having new .pids for each process should be fine, as long as those .pids don't collide with each other. Do you have tests that indicate that it's really a problem?

endyjasmi commented 10 years ago

Reason for storing count Storing count information is necessary. If no count were stored, the count will always be 1 and that doesn't make it a counter.

Reason for storing process id Storing process id in a state file is purely an assumption on my part which based on the name fingerprint. Just like fingerprint that can be used to identify a person, shouldn't it be able to identify (sort of) a cuid generating client. Thus, shouldn't it be consistent for a period of time?

If it is just another random string which is generated by the client, I can remove them.

Reason for locking file Locking the file help prevents concurrent access to the same file hence messed up the data that the file retain.

ericelliott commented 10 years ago

Storing count

Storing count information is necessary. If no count were stored, the count will always be 1 and that doesn't make it a counter.

That doesn't make sense. The count is incremented every time you call cuid(). The count will only always be 1 if you only call cuid() once per process -- and that should be fine even if you have many connections, because each process should generate its own fingerprint. If the process fingerprint is not unique enough to prevent collisions, work on improving the fingerprint implementation for this port.

Failing to do so risks slowing down every connection do deal with the shared counter state. This was never the intended implementation of CUID. They should be fast, synchronous, and non-blocking.

Process id

Storing process id in a state file is purely an assumption on my part which based on the name fingerprint. Just like fingerprint that can be used to identify a person, shouldn't it be able to identify (sort of) a cuid generating client. Thus, shouldn't it be consistent for a period of time?

The OS manages pids for you. There is never more than one process sharing the same PID at the same time. Even using the least significant bits of the PID (NOT the MSB's which change slower), Between PID and random number, you should have enough entropy to avoid collisions.

You should write uniqueness tests that ensure that you can generate millions of sequential CUIDs without collisions.

I notice in your test directory, you're testing that they are generating strings made up of the appropriate characters, but you're not creating uniqueness tests. You should fix that, and make sure that you are forking lots of processes and testing that the generated IDs don't collide.

Locking file

The CUID instances (even across different PIDs) should not need any shared state, so no locking should be necessary. This is really the whole point of CUIDs, because you should be able to safely generate them not just on one machine, but on many different machines at the same time without worrying about collision.

You can't do that if you're relying on shared state.

endyjasmi commented 10 years ago

Ok, that's make sense. I'll work on it and reply this when I'm done.

Thanks for the clarification.

ericelliott commented 10 years ago

:thumbsup:

endyjasmi commented 10 years ago

Ok, I have removed the need to persist state in text file. It should now work the same as the original counterpart of Cuid. I have also added a uniqueness test for both normal cuid and slug generation. Currently I only test 50k succession generation because of the timeout limit at travis.

I did try to test it with 2 million successive generation with no collision detected.

*Update: Slug fails the uniqueness test In php 5.4, it fails in 47746/50000 iteration In php 5.5, if fails in 29167/50000 iteration

jimthedev commented 8 years ago

@endyjasmi So what was the end result. I'm assuming it is still persisting to disk?

ericelliott commented 8 years ago

@endyjasmi

  1. Check the current slug implementation in CUID, make sure they're in sync.
  2. Use a CSPRNG for PHP if you can!