VirusTotal / yara

The pattern matching swiss knife
https://virustotal.github.io/yara/
BSD 3-Clause "New" or "Revised" License
8.08k stars 1.43k forks source link

Avoid rand() and std::rand() for generating pseudorandom numbers #1971

Closed philipjonsen closed 8 months ago

philipjonsen commented 11 months ago

DESCRIPTION

The C Standard pseudorandom number generator function rand() has limitations. While these generators use mathematical algorithms to produce a sequence of numbers with good statistical properties, they are not genuinely random. Instead, they rely on a seed value to generate a sequence of numbers that appears random, but is actually deterministic.

The problem with using the C Standard function rand() is that it makes no guarantees about the quality of the random sequence produced. The function have a comparatively short cycle, which means that the sequence of numbers generated will eventually repeat.

Moreover, the generated numbers may be predictable, making them unsuitable for applications that require high-quality pseudorandom numbers.

This also applies to the std::rand() from C++ standard library.

It is recommended to choose a generator that is sufficient for the specific needs of the application.

BAD PRACTICE:

include

include

std::string getNewIssueId() { std::string IssueId("ISSUE-");

// Holds the ID, starting with the characters "ISSUE" followed by a random // integer in the range [0-100000]. IssueId += std::to_string(std::rand() % 100000);

return IssueId; }

RECOMMENDED:

include

include

std::string getNewIssueId() { std::string IssueId("ISSUE-");

std::uniform_int_distribution Dist(0, 100000); std::random_device RandDev; std::mt19937 Engine(RandDev()); id += std::to_string(Dist(Engine));

return IssueId; }

Found using function rand which has limited randomness here: yara/blob/master/libyara/scanner.c#L224-L224

Reference: https://www.pcg-random.org/

plusvic commented 8 months ago

The rand() function is enough for the use-case we have in YARA. Please do not copy and paste the output produced by C code analyzers here. These tools are certainly useful, but not all the suggestions make sense in all situations.