Discussion of evaluation methods

OLDDDDDDD commented 1 year ago

Can we do an automated assessment, that is, for each 'stdnum', we can do stress testing, we can do a script that automatically generates test samples, and we can do automated assessment. This is an improvement over the way the evaluation is conducted.

For reviews, I also have an idea of asking the official for a database (although this is unlikely), so that we can also test our test program.

arthurdejong commented 1 year ago

Hi @zhangJXBH,

I don't think I 100% understand what you mean. There are some generic tests for robustness in tests/test_robustness.doctest. A stress-test is more in the area of performance while the current tests are more focused on correctness. Some performance testing has been done on some smaller parts (mostly to compare implementation solutions and then pick one).

In general, it is unlikely to get any kind of database with all (or some subset) of valid numbers. For most number formats there should already be around 100 test numbers already (if they are available) in the hope that most corner cases are covered with this. Since there is a large number of formats supported by python-stdnum already (more than 200 at the moment) making the test sets significantly larger will slow down the tests even further so there is a trad-off. When adding a format I generally test with as many numbers as I have available, while only 100 or so are kept for regression testing.

arthurdejong commented 1 year ago

Closing this due to lack of progress on this. Feel free to add more information and/or re-open the issue.

arthurdejong / python-stdnum

Discussion of evaluation methods #345