astral-sh / ruff

An extremely fast Python linter and code formatter, written in Rust.
https://docs.astral.sh/ruff
MIT License
32.85k stars 1.1k forks source link

Feature request: Lint rule to disallow unseeded RNGs (niche?) #14003

Open dmcc opened 2 weeks ago

dmcc commented 2 weeks ago

I'll preface that this lint rule would not be intended for many (maybe even most) use cases. I wouldn't expect it to be on by default or anything like that. I recognize this could be considered too niche so won't be offended if this isn't a good fit.

The use case

In some AI/ML training work, seeding your RNG is strongly encouraged to improve reproducibility. For cases where RNGs are used for ML and have no security implications, it would be nice to enable this.

Examples

This would be an error:

import random
# [...]
rng = random.Random()

as would:

import random
# [...]
x = random.random()

while this would be fine:

import random
# [...]
rng = random.Random(x=7)

(non-exhaustive list as we'd ideally cover popular library uses too)

dhruvmanila commented 1 week ago

seeding your RNG is strongly encouraged to improve reproducibility

I'm not sure I understand this. Should the rule enforce seeding if there's any random value being used? I'm not experienced in this area so can't really say how useful this rule would be. Regardless, I don't think this is possible until #1774 is completed.

dmcc commented 1 week ago

Should the rule enforce seeding if there's any random value being used?

Yes, exactly. Somewhat counterintuitively, in ML development (but not ML production), it's helpful for numbers to not be random for more repeatable experiments, easier debugging, etc. There are a good number of articles for ML development best practices suggesting folks seed their RNGs. Here's one which is part of the official PyTorch documentation, the leading deep learning framework.

Regardless, I don't think this is possible until https://github.com/astral-sh/ruff/issues/1774 is completed.

Yeah, makes total sense. I think most folks would be annoyed/confused if they enabled this rule unintentionally.

(and yes, I've considered writing this as a Python linter, but prefer to have it running faster in ruff...)