Closed gustavodaquino closed 1 year ago
I don't consider it safe, because collisions can happen even if there are few nodes.
Actually, the problem is not with the random number generator itself (ThreadLocalRandom), but with the unavailability to provide unique node identifiers for each TSID generator.
Let's say today 10 machines are started in a cluster. Each TSID generator will get a random node ID. If all these TSID generators are given different random node numbers from each other, it can be said that no collision will occur. So it's safe for today.
But if those machines are restarted for some reason tomorrow, the node IDs will be randomized again. If at least two TSID generators receive the same node number, say 42, the IDs generated by those two TSID generators can collide during the day.
The class below has a test method that starts 10 threads, assigning a new TSID generator to each thread.
package com.example;
import static org.junit.Assert.assertNull;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ThreadLocalRandom;
import java.util.logging.Logger;
import org.junit.Test;
import com.github.f4b6a3.tsid.TsidFactory;
public class ClashTest {
static Logger LOGGER = Logger.getLogger(ClashTest.class.getName());
private TsidFactory newFactory(int nodeBits) {
return TsidFactory.builder().withRandomFunction(() -> ThreadLocalRandom.current().nextInt())
.withNodeBits(nodeBits) // 8 bits: 256 nodes; 10 bits: 1024 nodes...
.build();
}
@Test
public void test() throws InterruptedException {
int nodeBits = 8;
int threadCount = 10;
int iterationCount = 100_000;
CountDownLatch endLatch = new CountDownLatch(threadCount);
ConcurrentMap<Long, Integer> tsidMap = new ConcurrentHashMap<>();
for (int i = 0; i < threadCount; i++) {
final int threadId = i;
// one generator PER THREAD
TsidFactory factory = newFactory(nodeBits);
new Thread(() -> {
for (int j = 0; j < iterationCount; j++) {
Long tsid = factory.create().toLong();
assertNull("TSID clash detected!", tsidMap.put(tsid, (threadId * iterationCount) + j));
}
endLatch.countDown();
}).start();
}
LOGGER.info("Started threads");
endLatch.await();
LOGGER.info("Done");
}
}
The code was adapted from the code implemented by Vlad Mihalcea in one of his excellent articles. You can change the nodeBits
, threadCount
, iterationCount
variables during the tests.
Thanks for the quick reply, @fabiolimace.
I have another doubt about node ids. At Factory builder function, javadoc says the function withNode(Integer node)
has a limit defined in 2^nodeBits-1.
Taking into consideration the limits of library, the max value allowed, considering a 20 bit length size is 1048575, but this value is not validated in runtime.
What happens to the Node Identifier used to generate the TSID if the parameter overflows this hard limit? Eg: 1048576
, or Int.MAX_VALUE (2.147.483.647)
, equivalent to 32 bit length? These extra bits are trimmed?
Regards.
Yes, the most significant extra bits are silently trimmed.
Here is the line that strips off the extra bits: TsidFactory(Builder builder)
.
In your example, 1048576
becomes 0
because:
1048576 & (2^20-1) == 0
It's the same as mod(1048576, 2^20)
:
1048576 % 2^20 == 0
Should it throw an exception or should the javadoc be fixed? Or both?
My best regards.
I think that a guard clause would avoid error prone by incorrect configurations and avoid sillently behavior to the client.
And there is a precedence, for example withNodeBits function that has it limit validation assuring the correct setup.
But one way or another it could be mentioned in README/JavaDoc.
Regards!!
Released v5.2.1. 🎉
Hello, first of all I'd like to congratulate the maintainer for this wonderful project. I'm wondering if, due an unavailability to provide a node identifier, the
ThreadLocalRandom
approach described on README is significantly safe to generate ids on a small cluster, < 10 nodes, assuring low probability of collision.My best regards.