#13 partial. Added geomath::ang_diff test and benchmark

stonylohr commented 3 years ago

Currently uses a static truncated dat file. I hope to improve on that in later pull requests.

The current draft makes geomath public. This is required for testing its operations using integration tests, but wouldn't be required if I created the test as a unit test instead. I think we'll want it to be public eventually, some of my non-CI geomath suggestions (e.g. #10) include breaking changes, so it might be preferable to avoid making geomath public in master until after those breaking changes are either applied or rejected.

stonylohr commented 3 years ago

The dat file format is my making. The character escaping is something I could leave out for now. The reason it's there is that some later geographiclib classes use string inputs and outputs that can contain spaces, colons, single and double quotes, etc. But yes, I can drop that logic for now and wait to restore it until we get to things that need it.

I can arrange to drop most dat file comments, and some content, without much issue. Specifically, I now think we'll mostly be relying on the criterion framework for time comparisons, so there's less value in the SYSTEMTAG, LIBRARYTAG, and TESTSTATUS lines than I'd originally been thinking. The same is true for the time section on individual operation lines, and the statistics section. I'll plan to drop all of those.

There should be no issue removing section comments. I'll plan to drop those as well.

This still leaves the operation code line (e.g. Math_AngDiff_x_y_e), and the format comment. While we could infer the operation from the file name, or have the knowledge hard-coded in the tests, I think there's value in keeping it. Similarly, because the line interpretation varies from one operation to another, and there are so many operations, I think there's value in including it in the dat file explicitly, rather than only having it in the test code. I propose that I combine these two items into a single line, which would now always be the first line of the file.

Would that be an acceptable compromise?

stonylohr commented 3 years ago

I've also given some more thought to the sections within lines, and I think I'm satisfied that we can drop them. As I mentioned in my previous comment, I think we can drop the time section.

I originally separated the error section from outputs in part to retain the option to report ending reference or state values even in error cases, and in part to avoid ambiguity for functions that return non-numeric outputs. In practice, I'm not reporting any outputs in error cases, and it's probably fine to assume that no non-error case will result in an actual output value of "ERROR". So I think dropping that separation should be ok.

In terms of removing the separation between inputs and outputs, I think that in order to do this safely while also retaining support for a variable number of inputs (as will come up in some later array-based cases), I may need to make the outputs required rather than optional. That shouldn't have any effect on the rust side of things, where those outputs are of central interest, but it may require some more work on the C++ side in my case creation logic. I currently use a 2-stage approach where some special value cases are initially created as input-only lines, but I think I can avoid that.

Anyway, I'll proceed with trying to remove all of the line-level section separators.

As I mentioned in my previous comment, I still favor including a single header line in each data file that specifies the operation code and (as an implicit comment) the line format.

In my current pass, I also don't plan to change formatting of NaN and infinity values to something that rust's f64::parse will accept. The current values use the default C++ standard library formatting. I think it makes the most sense to stick with the C++ convention, since I think the change would be pretty coding-time intensive, and no single convention will match every language's preferred approach. We can revisit that topic after this pass if you wish.

I'm attaching a couple of heavily truncated sample files illustrating a format I think I can get us to at the end of my current pass. I've renamed the files, so github will let me attach them unzipped.

Geodesic_C3f.dat.txt Math_AngDiff_x_y_e.dat.txt

michaelkirk commented 3 years ago

Thanks for your response!

The dat file format is my making. The character escaping is something I could leave out for now. The reason it's there is that some later geographiclib classes use string inputs and outputs that can contain spaces, colons, single and double quotes, etc. But yes, I can drop that logic for now and wait to restore it until we get to things that need it.

Yes let's keep it as simple as possible for now.

I can arrange to drop most dat file comments, and some content, without much issue. Specifically, I now think we'll mostly be relying on the criterion framework for time comparisons, so there's less value in the SYSTEMTAG, LIBRARYTAG, and TESTSTATUS lines than I'd originally been thinking. The same is true for the time section on individual operation lines, and the statistics section. I'll plan to drop all of those.

Great - yep I think criterion is a good way to go.

There should be no issue removing section comments. I'll plan to drop those as well.

This still leaves the operation code line (e.g. Math_AngDiff_x_y_e), and the format comment. While we could infer the operation from the file name, or have the knowledge hard-coded in the tests, I think there's value in keeping it. Similarly, because the line interpretation varies from one operation to another, and there are so many operations, I think there's value in including it in the dat file explicitly, rather than only having it in the test code. I propose that I combine these two items into a single line, which would now always be the first line of the file.

I was most nervous about lines like this:

 #STATS-SECTION
 #COUNT_TESTS 300
 #COUNT_USED 300
 #TOTAL_TEST_SECONDS 2.4075000000000033e-05
 #MIN_TEST_SECONDS 6.5999999999999995e-08
 #MAX_TEST_SECONDS 1.1119999999999999e-06
 #MEDIAN_TEST_SECONDS 6.8e-08
 #MEAN_TEST_SECONDS 8.0250000000000113e-08

Which to me looked less like "comments" and more like structured data written and parseable by a machine.

I'm ok with a single comment at the top.

Another thought, if you're feeling bold, we could just do csv/tsv where a top row with field names is a common convention.

michaelkirk commented 3 years ago

I originally separated the error section from outputs in part to retain the option to report ending reference or state values even in error cases

I'm not sure what this means - what are "error"s in this context? Are we trying to match geographic lib crash for crash? 😆

stonylohr commented 3 years ago

I originally separated the error section from outputs in part to retain the option to report ending reference or state values even in error cases

I'm not sure what this means - what are "error"s in this context? Are we trying to match geographic lib crash for crash? laughing

I know it sounds funny, but that's pretty close to the truth. When a geographiclib call errors out, it's not usually because of a coding error -- it's usually because the inputs fall into a category that's explicitly rejected by the geographiclib code. In fact, many of Karney's geographiclib test cases are specifically confirming that a function rejects certain inputs. By default, I think the rust code should use the same behaviors. For both this and other variations, we're likely to encounter some cases where the rust version has a legitimate reason to behave differently from the C++, but I think it will be the exception rather than the rule. We can decide how to address those cases when we encounter them.

michaelkirk commented 3 years ago

That makes sense. Thanks for explaining.

On Dec 9, 2020, at 21:29, stonylohr notifications@github.com wrote:

I originally separated the error section from outputs in part to retain the option to report ending reference or state values even in error cases

I'm not sure what this means - what are "error"s in this context? Are we trying to match geographic lib crash for crash? laughing

I know it sounds funny, but that's pretty close to the truth. When a geographiclib call errors out, it's not usually because of a coding error -- it's usually because the inputs fall into a category that's explicitly rejected by the geographiclib code. In fact, many of Karney's geographiclib test cases are specifically confirming that a function rejects certain inputs. By default, I think the rust code should use the same behaviors. For both this and other variations, we're likely to encounter some cases where the rust version has a legitimate reason to behave differently from the C++, but I think it will be the exception rather than the rule. We can decide how to address those cases when we encounter them.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

stonylohr commented 3 years ago

I'll give the csv/tsv option some more thought. If nothing else, tabs might be less likely to show up in string values than spaces are, when we eventually get to that point. Unfortunately, I don't see how to make the header line a proper csv/tsv header line, in part because the operation code is something that doesn't correspond to any particular item on the later lines, and in part because some items in the header line correspond to a variable number of items on the later lines. More frequently, they refer to a fixed number of items in an abbreviated form, but I'm not excited to bend over backwards for a format change that I know is going to fall apart eventually anyway.

michaelkirk commented 3 years ago

For now I'd be fine sticking with the .dat as you have it if that's your preference - especially since, as I understand it, you're saying there are other examples where something like a tsv won't work well.

Later if we can find a way to easily use a format which doesn't require reading parser code to understand, that could be a future improvement.

stonylohr commented 3 years ago

@michaelkirk I've committed a new set of changes that I think should address your concerns with my previous draft. It also adds a fairly complete set of tests and benchmarks for public Geodesic and geomath functions.

I also tried to simplify the process of getting logged c++ data: After cloning, run "git submodule update --remote". The test and benchmark code should handle everything else.

Let me know if you'd like further adjustments or discussion.

stonylohr commented 3 years ago

@michaelkirk Since my earlier draft, I've also started leaning toward switching my tests from being integration tests to unit tests. As I noted previously, their mechanics aren't ideal for being categorized as unit tests, but I'm coming to think that the increased access available to unit tests probably outweighs the "not ideal unit tests" consideration. I'd be curious to hear whether you have any thoughts on the topic in either direction.

michaelkirk commented 3 years ago

I've also started leaning toward switching my tests from being integration tests to unit tests. As I noted previously, their mechanics aren't ideal for being categorized as unit tests, but I'm coming to think that the increased access available to unit tests probably outweighs the "not ideal unit tests" consideration. I'd be curious to hear whether you have any thoughts on the topic in either direction.

re: unit vs. integration - I'm not really sure what you mean in this context.

I highly value the testing of the higher level public methods and think there's less value in, e.g. testing each individual method in geomath.

The overall "flow" of data in your branch makes sense to me. My biggest concern right now is that you're relying on a instrumented version of geoagraphic lib to generate the test input. I'd much prefer to use Karney's published (and maintained) test data so we're not in a situation down the road where we have to remember how this works and pray the patches still apply. Conversely it'd be relatively easy to download the updated test data from Karney.

stonylohr commented 3 years ago

I've also started leaning toward switching my tests from being integration tests to unit tests. As I noted previously, their mechanics aren't ideal for being categorized as unit tests, but I'm coming to think that the increased access available to unit tests probably outweighs the "not ideal unit tests" consideration. I'd be curious to hear whether you have any thoughts on the topic in either direction.

re: unit vs. integration - I'm not really sure what you mean in this context.

I highly value the testing of the higher level public methods and think there's less value in, e.g. testing each individual method in geomath.

The overall "flow" of data in your branch makes sense to me. My biggest concern right now is that you're relying on a instrumented version of geoagraphic lib to generate the test input. I'd much prefer to use Karney's published (and maintained) test data so we're not in a situation down the road where we have to remember how this works and pray the patches still apply. Conversely it'd be relatively easy to download the updated test data from Karney.

For a general discussion of unit testing vs integration testing in Rust, this may be helpful: https://doc.rust-lang.org/book/ch11-03-test-organization.html

I agree that Karney's dataset is a useful testing tool, but the results in that dataset are specific to WGS84, so can't be relied on to catch bugs in the handling of other ellipsoids. Also, I don't see the GeodTest-100.dat file on the geographiclib data page (https://sourceforge.net/projects/geographiclib/files/testdata/). Depending on how it's created and how the original files are structured, I'm not sure that it can be relied on to test all the categories of cases that the original GeodTest files are designed for. If it's just the first 100 lines, that's fine if Karney's case categories are evenly distributed, but less so if they're clumped.

I also agree that it may eventually make sense to abandon my instrumentation-based tests entirely, once the effort of maintaining them exceeds the value they provide. I think they're likely to be helpful for a while yet, though.

While geomath operations are currently non-public, and likely to see less outside use even if that changes, their C++ counterparts are publicly accessible, and I see them as something that is likely to make sense to expose at some point. I think there is also value in testing these lower level functions, in terms of helping to pinpoint where issues may be originating in the higher level functions. This is value that is likely to fade as geographiclib-rs becomes more stable, and could be dropped at some point in the future, but I think there's a bit of work before we get to that point.

Anyway, I propose the following as a broad strokes version of next steps:

I'll replace my current integration tests with corresponding unit tests, and enable additional checks in them that need access to non-public items. However, I'll mark these tests using #[ignore], so they'll only run if specifically requested.
I'll add unit tests based on Karney's C++ tests, and not mark these as ignored.
I'll add unit tests that use GeodTest-short as their basis, initially marked as ignored.

stonylohr commented 3 years ago

I just committed changes addressing many of the items we've discussed, but not all:

Commented out integration test code. Assuming you approve of the direction I'm taking things, I expect to remove those files entirely soon.
Added unit tests corresponding to the now-disabled integration tests. These tests are flagged as ignored, so they only run if specifically requested. With that in place, I've tightened their expectations, so many of them will fail if run. I expect those failures to eventually be resolved through a combination of bug fixes and re-loosening of test criteria, though not as part of this PR.
Added geodesic unit tests corresponding to Karney's test cases found in various versions of geographiclib. Some of these also fail, so I've marked the failing ones as ignore for now as well. Some parts are also commented out because they test operations that don't appear to be present or public in geographiclib-rs (mostly things related to GeodesicLine). I'd prefer to leave sorting that out as a task for after the current PR as well.
Made geomath non-public again. This breaks the geomath benchmarks, so I've commented those out for now. I figured I'd check your preferences before removing those benchmarks entirely.
Moderately expanded comments in my code.
Made some changes to "utilities" code. It compiles, but I haven't tested it yet.
Probably a few other things I'm forgetting...

stonylohr commented 3 years ago

I've been thinking about test naming conventions and would be interested in your thoughts on the topic. There are several different pieces I think might have value, and I think that using a consistent ordering to the pieces would be preferable. Being consistent in test (and benchmark) naming conventions will make it easier to run just a subset of test or bench routines. I'll list pieces in the order I would currently favor, which doesn't match the order used in my commit:

"test": This is kind of redundant with the test tag, but may still have some value. I could go either way on including it at all.
The test's group: Examples in my most recent commit include "vs_python" (which I applied to the pre-existing tests, since that's what they appeared to be), "std" (applied to Karney's unit tests), "vs_cpp" for my tests against instrumented c++. Later this weekend, I hope to add some "geodtest" tests that use a GeodTest*.dat file as the basis.
The name of the test's parent module (or something equivalent for integration tests): For example, all tests in geodesic.rs would use "geodesic".
An identifier for the specific test within the categories above: For example, each of the "vs_cpp" tests uses an operation name for this piece, and each of the "std" tests uses an all-lower-case version of the name of the test it's based on.

stonylohr commented 3 years ago

I've now added tests based on GeodTest.dat, tagged "ignore". As with other tests, it will take me some time to pin down a good level of tolerance to allow.

stonylohr commented 3 years ago

@michaelkirk I just made a commit that I think is a good candidate for fully resolving #13. I would be interested in your feedback.

A few highlights include:

I standardized naming for all my new tests and benches, but restored all pre-existing tests and benches to their original names.
I added benches that use the full GeodTest.dat file.
I set all slow and/or failing tests ignored. Currently failing tests should later be un-ignored as the underlying bugs are resolved, but that doesn't fall under #13.
I enabled a previously-commented-out test from java geographiclib for arc-mode direct, but set it ignored because it fails.
I improved my testing infrastructure in the utilities mod, and used it to improve the tolerances used in some tests
I removed the tests folder. I've concluded that I'm happier with all current tests in their current "unit test" formulation in the src folder than with my previous "integration test" formulation in the tests folder.

stonylohr commented 3 years ago

I did think of a few remaining rough edges that I'll work on polishing, but the changes should be more contained than other recent commits. I hope to make the changes over this weekend.

stonylohr commented 3 years ago

I've now committed the changes I alluded to yesterday.

I think the main point of interest is that I've removed use of a submodule and zip logic to handle the C++ instrumented data files. Instead, created an approach more similar to that used for GeodTest.dat: I've made zip files available for download on my sourceforge wiki page (https://sourceforge.net/u/alephknot/wiki/Home/), and leave it to the user to download and unzip the files to the appropriate location. If a file isn't found, the error message tries to include both the expected data file location, and instructions on how to find zipped files, much like the approach for GeodTest.dat. Recall that all of the affected tests are set ignored, so do not run by default.

Note that I had hoped to use a single zip file for all of my data files using a kind of fakey release, but sourceforge isn't letting me create a release (perhaps because of the project's forked status). The full data file zip is too large to do as a wiki attachment. Instead, I've broken it into a separate zip file for each of the 3 largest class outputs (Geodesic, GeodesicLine, and Math), and a fourth zip file for all others (none of which matter yet to geographiclib-rs).

stonylohr commented 3 years ago

While looking around validate-geographiclib and the core geographiclib some more, I realized that geographiclib has a few different categories of tests in different places, and my tests so far missed some. I'll look into adding those as well.

stonylohr commented 3 years ago

After a bit of offline discussion, it's become clear that my branch is way more involved than @michaelkirk is looking for. I'll create a new branch and pull request with the much more limited change I now think he'd like. Other items from this branch may be resubmitted at points in the future under separate issues.

georust / geographiclib-rs

#13 partial. Added geomath::ang_diff test and benchmark #14