dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.07k stars 1.56k forks source link

Should we verify the contents of tests whose purpose is encoding or file formats? #53398

Open eernstg opened 1 year ago

eernstg commented 1 year ago

The syntax of Dart is specified in such a way that lines can end in \n, \r, or \r\n, which implies that any sequence of \n and \r in any order is accepted as whitespace and henceforth ignored (except that the existence of whitespace is used in one situation to disambiguate an otherwise ambiguous term; but even that doesn't depend on which kind of whitespace it is).

We have a few tests whose purpose is to verify that the tools do indeed accept line endings in source code as specified.

In addition, we also have some tests whose purpose is to test the treatment of string literals (containing UTF-8 text).

With respect to line endings, the tests can be destroyed by running dos2unix or a utility like that, because they won't test anything of interest after being converted. A manual inspection may very well fail to see the difference, because line endings (of any kind) are normally not shown. The situation is similar for other tests containing specialized characters.

Would it be worth the effort to take a sha1sum of such tests, store those sha1 values somewhere (perhaps foo_A01_t01.dart could have its sha1 stored in foo_A01_t01.sha1), and run a presubmit script that checks the sha1 if the corresponding library has been modified?

The approach could also make sense for non-source-code artifacts (images etc.) which are part of a test.

eernstg commented 1 year ago

Some examples of tests with specialized characters are given here: https://github.com/dart-lang/co19/issues/2242#issuecomment-1700621605.

eernstg commented 1 year ago

@athomas, WDYT? Do we already have a mechanism which would ensure that certain (encoding sensitive) tests aren't modified significantly by accident? (Some editor or some revision management system might well try to "clean up" whitespace and similar elements of a source file.) Otherwise, would it be worth the trouble to do it?

athomas commented 1 year ago

If the file is binary in .gitattributes, git shouldn't touch it at least. I don't know about IDEs. As long as the test fails if cleaned up that should provide enough signal?

eernstg commented 1 year ago

But it wouldn't necessarily fail. For instance, if we wish to ensure that the parser will not crash when it encounters a library whose file representation uses non-standard line endings (for example, some lines ending in \r\n and some lines ending in \n) and it is changed to a standard \n encoding then it surely won't fail after the "normalization".

athomas commented 1 year ago

Maybe the easiest would be to have a second test that reads the file and checks that the bytes match an expected pattern? It wouldn't work on analyzer/cfe configurations but the VM would run the second test and fail if the bytes are mismatching.

athomas commented 1 year ago

I'm a bit reluctant to add a special case to the infrastructure for something we'll rarely need.