Closed stonylohr closed 3 years ago
The dat file format is my making. The character escaping is something I could leave out for now. The reason it's there is that some later geographiclib classes use string inputs and outputs that can contain spaces, colons, single and double quotes, etc. But yes, I can drop that logic for now and wait to restore it until we get to things that need it.
I can arrange to drop most dat file comments, and some content, without much issue. Specifically, I now think we'll mostly be relying on the criterion framework for time comparisons, so there's less value in the SYSTEMTAG, LIBRARYTAG, and TESTSTATUS lines than I'd originally been thinking. The same is true for the time section on individual operation lines, and the statistics section. I'll plan to drop all of those.
There should be no issue removing section comments. I'll plan to drop those as well.
This still leaves the operation code line (e.g. Math_AngDiff_x_y_e), and the format comment. While we could infer the operation from the file name, or have the knowledge hard-coded in the tests, I think there's value in keeping it. Similarly, because the line interpretation varies from one operation to another, and there are so many operations, I think there's value in including it in the dat file explicitly, rather than only having it in the test code. I propose that I combine these two items into a single line, which would now always be the first line of the file.
Would that be an acceptable compromise?
I've also given some more thought to the sections within lines, and I think I'm satisfied that we can drop them. As I mentioned in my previous comment, I think we can drop the time section.
I originally separated the error section from outputs in part to retain the option to report ending reference or state values even in error cases, and in part to avoid ambiguity for functions that return non-numeric outputs. In practice, I'm not reporting any outputs in error cases, and it's probably fine to assume that no non-error case will result in an actual output value of "ERROR". So I think dropping that separation should be ok.
In terms of removing the separation between inputs and outputs, I think that in order to do this safely while also retaining support for a variable number of inputs (as will come up in some later array-based cases), I may need to make the outputs required rather than optional. That shouldn't have any effect on the rust side of things, where those outputs are of central interest, but it may require some more work on the C++ side in my case creation logic. I currently use a 2-stage approach where some special value cases are initially created as input-only lines, but I think I can avoid that.
Anyway, I'll proceed with trying to remove all of the line-level section separators.
As I mentioned in my previous comment, I still favor including a single header line in each data file that specifies the operation code and (as an implicit comment) the line format.
In my current pass, I also don't plan to change formatting of NaN and infinity values to something that rust's f64::parse will accept. The current values use the default C++ standard library formatting. I think it makes the most sense to stick with the C++ convention, since I think the change would be pretty coding-time intensive, and no single convention will match every language's preferred approach. We can revisit that topic after this pass if you wish.
I'm attaching a couple of heavily truncated sample files illustrating a format I think I can get us to at the end of my current pass. I've renamed the files, so github will let me attach them unzipped.
Thanks for your response!
The dat file format is my making. The character escaping is something I could leave out for now. The reason it's there is that some later geographiclib classes use string inputs and outputs that can contain spaces, colons, single and double quotes, etc. But yes, I can drop that logic for now and wait to restore it until we get to things that need it.
Yes let's keep it as simple as possible for now.
I can arrange to drop most dat file comments, and some content, without much issue. Specifically, I now think we'll mostly be relying on the criterion framework for time comparisons, so there's less value in the SYSTEMTAG, LIBRARYTAG, and TESTSTATUS lines than I'd originally been thinking. The same is true for the time section on individual operation lines, and the statistics section. I'll plan to drop all of those.
Great - yep I think criterion is a good way to go.
There should be no issue removing section comments. I'll plan to drop those as well.
This still leaves the operation code line (e.g. Math_AngDiff_x_y_e), and the format comment. While we could infer the operation from the file name, or have the knowledge hard-coded in the tests, I think there's value in keeping it. Similarly, because the line interpretation varies from one operation to another, and there are so many operations, I think there's value in including it in the dat file explicitly, rather than only having it in the test code. I propose that I combine these two items into a single line, which would now always be the first line of the file.
I was most nervous about lines like this:
#STATS-SECTION
#COUNT_TESTS 300
#COUNT_USED 300
#TOTAL_TEST_SECONDS 2.4075000000000033e-05
#MIN_TEST_SECONDS 6.5999999999999995e-08
#MAX_TEST_SECONDS 1.1119999999999999e-06
#MEDIAN_TEST_SECONDS 6.8e-08
#MEAN_TEST_SECONDS 8.0250000000000113e-08
Which to me looked less like "comments" and more like structured data written and parseable by a machine.
I'm ok with a single comment at the top.
Another thought, if you're feeling bold, we could just do csv/tsv where a top row with field names is a common convention.
I originally separated the error section from outputs in part to retain the option to report ending reference or state values even in error cases
I'm not sure what this means - what are "error"s in this context? Are we trying to match geographic lib crash for crash? 😆
I originally separated the error section from outputs in part to retain the option to report ending reference or state values even in error cases
I'm not sure what this means - what are "error"s in this context? Are we trying to match geographic lib crash for crash? laughing
I know it sounds funny, but that's pretty close to the truth. When a geographiclib call errors out, it's not usually because of a coding error -- it's usually because the inputs fall into a category that's explicitly rejected by the geographiclib code. In fact, many of Karney's geographiclib test cases are specifically confirming that a function rejects certain inputs. By default, I think the rust code should use the same behaviors. For both this and other variations, we're likely to encounter some cases where the rust version has a legitimate reason to behave differently from the C++, but I think it will be the exception rather than the rule. We can decide how to address those cases when we encounter them.
That makes sense. Thanks for explaining.
On Dec 9, 2020, at 21:29, stonylohr notifications@github.com wrote:
 I originally separated the error section from outputs in part to retain the option to report ending reference or state values even in error cases
I'm not sure what this means - what are "error"s in this context? Are we trying to match geographic lib crash for crash? laughing
I know it sounds funny, but that's pretty close to the truth. When a geographiclib call errors out, it's not usually because of a coding error -- it's usually because the inputs fall into a category that's explicitly rejected by the geographiclib code. In fact, many of Karney's geographiclib test cases are specifically confirming that a function rejects certain inputs. By default, I think the rust code should use the same behaviors. For both this and other variations, we're likely to encounter some cases where the rust version has a legitimate reason to behave differently from the C++, but I think it will be the exception rather than the rule. We can decide how to address those cases when we encounter them.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
I'll give the csv/tsv option some more thought. If nothing else, tabs might be less likely to show up in string values than spaces are, when we eventually get to that point. Unfortunately, I don't see how to make the header line a proper csv/tsv header line, in part because the operation code is something that doesn't correspond to any particular item on the later lines, and in part because some items in the header line correspond to a variable number of items on the later lines. More frequently, they refer to a fixed number of items in an abbreviated form, but I'm not excited to bend over backwards for a format change that I know is going to fall apart eventually anyway.
For now I'd be fine sticking with the .dat as you have it if that's your preference - especially since, as I understand it, you're saying there are other examples where something like a tsv won't work well.
Later if we can find a way to easily use a format which doesn't require reading parser code to understand, that could be a future improvement.
@michaelkirk I've committed a new set of changes that I think should address your concerns with my previous draft. It also adds a fairly complete set of tests and benchmarks for public Geodesic and geomath functions.
I also tried to simplify the process of getting logged c++ data: After cloning, run "git submodule update --remote". The test and benchmark code should handle everything else.
Let me know if you'd like further adjustments or discussion.
@michaelkirk Since my earlier draft, I've also started leaning toward switching my tests from being integration tests to unit tests. As I noted previously, their mechanics aren't ideal for being categorized as unit tests, but I'm coming to think that the increased access available to unit tests probably outweighs the "not ideal unit tests" consideration. I'd be curious to hear whether you have any thoughts on the topic in either direction.
I've also started leaning toward switching my tests from being integration tests to unit tests. As I noted previously, their mechanics aren't ideal for being categorized as unit tests, but I'm coming to think that the increased access available to unit tests probably outweighs the "not ideal unit tests" consideration. I'd be curious to hear whether you have any thoughts on the topic in either direction.
re: unit vs. integration - I'm not really sure what you mean in this context.
I highly value the testing of the higher level public methods and think there's less value in, e.g. testing each individual method in geomath.
The overall "flow" of data in your branch makes sense to me. My biggest concern right now is that you're relying on a instrumented version of geoagraphic lib to generate the test input. I'd much prefer to use Karney's published (and maintained) test data so we're not in a situation down the road where we have to remember how this works and pray the patches still apply. Conversely it'd be relatively easy to download the updated test data from Karney.
I've also started leaning toward switching my tests from being integration tests to unit tests. As I noted previously, their mechanics aren't ideal for being categorized as unit tests, but I'm coming to think that the increased access available to unit tests probably outweighs the "not ideal unit tests" consideration. I'd be curious to hear whether you have any thoughts on the topic in either direction.
re: unit vs. integration - I'm not really sure what you mean in this context.
I highly value the testing of the higher level public methods and think there's less value in, e.g. testing each individual method in geomath.
The overall "flow" of data in your branch makes sense to me. My biggest concern right now is that you're relying on a instrumented version of geoagraphic lib to generate the test input. I'd much prefer to use Karney's published (and maintained) test data so we're not in a situation down the road where we have to remember how this works and pray the patches still apply. Conversely it'd be relatively easy to download the updated test data from Karney.
For a general discussion of unit testing vs integration testing in Rust, this may be helpful: https://doc.rust-lang.org/book/ch11-03-test-organization.html
I agree that Karney's dataset is a useful testing tool, but the results in that dataset are specific to WGS84, so can't be relied on to catch bugs in the handling of other ellipsoids. Also, I don't see the GeodTest-100.dat file on the geographiclib data page (https://sourceforge.net/projects/geographiclib/files/testdata/). Depending on how it's created and how the original files are structured, I'm not sure that it can be relied on to test all the categories of cases that the original GeodTest files are designed for. If it's just the first 100 lines, that's fine if Karney's case categories are evenly distributed, but less so if they're clumped.
I also agree that it may eventually make sense to abandon my instrumentation-based tests entirely, once the effort of maintaining them exceeds the value they provide. I think they're likely to be helpful for a while yet, though.
While geomath operations are currently non-public, and likely to see less outside use even if that changes, their C++ counterparts are publicly accessible, and I see them as something that is likely to make sense to expose at some point. I think there is also value in testing these lower level functions, in terms of helping to pinpoint where issues may be originating in the higher level functions. This is value that is likely to fade as geographiclib-rs becomes more stable, and could be dropped at some point in the future, but I think there's a bit of work before we get to that point.
Anyway, I propose the following as a broad strokes version of next steps:
#[ignore]
, so they'll only run if specifically requested.I just committed changes addressing many of the items we've discussed, but not all:
I've been thinking about test naming conventions and would be interested in your thoughts on the topic. There are several different pieces I think might have value, and I think that using a consistent ordering to the pieces would be preferable. Being consistent in test (and benchmark) naming conventions will make it easier to run just a subset of test or bench routines. I'll list pieces in the order I would currently favor, which doesn't match the order used in my commit:
I've now added tests based on GeodTest.dat, tagged "ignore". As with other tests, it will take me some time to pin down a good level of tolerance to allow.
@michaelkirk I just made a commit that I think is a good candidate for fully resolving #13. I would be interested in your feedback.
A few highlights include:
I did think of a few remaining rough edges that I'll work on polishing, but the changes should be more contained than other recent commits. I hope to make the changes over this weekend.
I've now committed the changes I alluded to yesterday.
I think the main point of interest is that I've removed use of a submodule and zip logic to handle the C++ instrumented data files. Instead, created an approach more similar to that used for GeodTest.dat: I've made zip files available for download on my sourceforge wiki page (https://sourceforge.net/u/alephknot/wiki/Home/), and leave it to the user to download and unzip the files to the appropriate location. If a file isn't found, the error message tries to include both the expected data file location, and instructions on how to find zipped files, much like the approach for GeodTest.dat. Recall that all of the affected tests are set ignored, so do not run by default.
Note that I had hoped to use a single zip file for all of my data files using a kind of fakey release, but sourceforge isn't letting me create a release (perhaps because of the project's forked status). The full data file zip is too large to do as a wiki attachment. Instead, I've broken it into a separate zip file for each of the 3 largest class outputs (Geodesic, GeodesicLine, and Math), and a fourth zip file for all others (none of which matter yet to geographiclib-rs).
While looking around validate-geographiclib and the core geographiclib some more, I realized that geographiclib has a few different categories of tests in different places, and my tests so far missed some. I'll look into adding those as well.
After a bit of offline discussion, it's become clear that my branch is way more involved than @michaelkirk is looking for. I'll create a new branch and pull request with the much more limited change I now think he'd like. Other items from this branch may be resubmitted at points in the future under separate issues.
Currently uses a static truncated dat file. I hope to improve on that in later pull requests.
The current draft makes geomath public. This is required for testing its operations using integration tests, but wouldn't be required if I created the test as a unit test instead. I think we'll want it to be public eventually, some of my non-CI geomath suggestions (e.g. #10) include breaking changes, so it might be preferable to avoid making geomath public in master until after those breaking changes are either applied or rejected.