Closed sloorush closed 2 months ago
Information about a lot of data is present in: http://wiki.dyalog.bramley/index.php/Performance_Measurement/pqa#Expression_labels
I am planning on following the same pattern (if not the same code), for the tests for the masses of data
One more excerpt from Peter's email
I really think we need to have more random data. We talked a bit about this at the internal meetings as well (and I think I have mentioned it a couple of times before when reviewing earlier tests). You do provide test cases for what is mostly hand-picked data, which covers all the non-nested data types. But since the data itself is hand-picked, that means there are certain inputs we will never test. Having the data itself be random, but still guaranteed to be of the correct datatype, will help us test many more cases over time. Perhaps after running the tests for a month we happen to run the test with some random input which does not produce the expected result (which also means there should be a way to reproduce the failed run by setting ⎕RL manually) For example, in your tests all the 4-byte integers that are tested are always in the ranges 100000-100100 and ¯100000-¯100100, which are only 200 numbers out of the 4 billion possible values. So, I think some time should be spent working on producing a set of functions to generate this random data, as I feel it is important, and it is something that will be useful in pretty much all the tests you write.
Something which isn't so important for divide but will be important when testing other of the scalar primitives such as + or ×: Looping over the combinations of datatypes in the input is not enough. The result type should also be considered for each combination of the input types. For example, adding two vectors of 2-byte integers could produce a Boolean vector, a 1-byte integer vector, a 2-byte integer vector, or a 4-byte integer vector. While I think we should test all these combinations (by choosing the data carefully, but still at random within the ranges that are needed), the 4-byte and 2-byte results are the most interesting. In the 4-byte integer vector result case, the addition "overflows" the input data type, and that is handled differently in the interpreter.
Information dump: E & J are interesting characters to look for. 11×1, 1×11, 11×11 matrices are interesting
Have a standardized module to make data generation standard across all the tests for the primitives.
Thought dump on this issue: