Closed austinjp closed 11 months ago
Thank you for your kind words! The random number generator is indeed seeded with a constant at the beginning of the program's operation so that a3k results are repeatable. The workaround you suggest should help with your process. We could add an option to set the seed (or not set it at all) if you think this is an important feature.
Looking forward to read about your results.
Hi again. No problem! :smiley: I just had a look through the code and yep, I spotted the deterministic seeding. I appreciate that it might be useful, I just wasn't anticipating it. Perhaps the docs could highlight the fact that the sampling is deterministic? I'll send a PR, feel free to use/ignore as you see fit.
My workaround 'works', although it's inefficient. I guess that's not really a problem in reality, since it's plenty fast enough for my needs. A CLI flag for setting a fresh seed every invocation might be good, though, since it would allow users to set the seed themselves and hence have more control. But this is more of a feature request than an issue, so I'm happy for this to be closed.
Hi there. Firstly, thanks for a3k, I'm finding it very useful.
I noticed a problem when using
--sample 'random.random() < 0.0001'
to randomly sample from the latest Crossref dataset. It seemed to produce identical samples each time, whereas I was expecting it to produce different samples each time. I've not yet looked through the code, but I wondered if it might be an issue with seeding the random generator? Perhaps this is expected behaviour, so apologies if I missed this in the docs.An example:
Notice the identical results after deleting and recreating the database with a 'fresh' sample. Perhaps this is expected behaviour, but I was expecting a random sample, and hence different each time.
Some quick sanity checks:
Workaround
As a workaround, I use
--sample '( random.seed() ) or random.random() < 0.0001'
to re-seed the random generator at every sample decision. It's inefficient, but it gives the results I'd expected:Best wishes.