Proposal for test_condition

milancurcic commented 4 years ago

Evolved from discussion in #116.

Summary

This is a proposal to add a subroutine test_condition to stdlib_experimental_error that will:

Test the value of a logical condition like assert in stdlib_experimental_error does;
Allow printing a custom error message to stderr;
Optionally, allow passing an exit code like with assert;
Optionally, only print the message without stopping the program, i.e. warn only;

This is useful for testing both programmer and user errors. It can be used regardless of how the program is built (debug or release mode).

It could also be used as a basic building block toward higher-level testing or exception handling proposed in #95.

Description

test_condition tests an input logical condition. If the condition evaluates to .false., the default behavior is to stop the program and print the error message. Optionally, the user can pass an exit code to stop the program with. Optionally, the user can choose to not stop the program, but only print the warning.

API

subroutine test_condition(condition, msg, code, warn)
  logical, intent(in) :: condition
  character(*), intent(in), optional :: msg
  integer, intent(in), optional :: code
  logical, intent(in), optional :: warn

Arguments

condition: Logical condition to test
msg: (optional) Character string to print to stderr. Default
code: (optional) Integer exit code to set when stopping the program. (what should be the default value? Current implementation of assert doesn't set any default code as far as I can tell.)
warn: (optional) Logical. If .false. (default), test_condition will stop the program if condition is also false. If .true., it will print the error message to the screen and resume.

Example

use stdlib_experimental_error, only: test_condition
...
call test_condition(n > 0, 'n > 0 failed', warn=.true.)

Implementation

subroutine test_condition(condition, msg, code, warn)
  logical, intent(in) :: condition
  character(*), intent(in), optional :: msg
  integer, intent(in), optional :: code
  logical, intent(in), optional :: warn
  if (.not. condition) then
    if (optval(warn, .false.)) then
      write(srderr,*) optval(msg, "Test failed.")
    else
      call error_stop(optval(msg, "Test failed."), code)
    end if
  end if
end subroutine test_condition

This implementation depends on optval and error_stop.

Requesting feedback from @certik @nncarlson @zbeekman @jvdp1 @ivan-pi.

For anybody reading, please explicitly thumbs up or down, and write a comment if thumbs down. This way we can get a clear idea on whether this is supported or not and how much.

certik commented 4 years ago

I kind of like the name assert for writing tests. Obviously we need to distinguish it from the Debug assert macro. Rust has assert (always executed in both Debug and Release modes) and debug_assert (only in Debug mode): https://doc.rust-lang.org/std/macro.assert.html.

Also this is probably going to be part of a testing framework. So we should look at what other Fortran testing frameworks use:

funit: assert_true
pfunit: assertTrue
FRUIT: assert_true
flibs: assert
zofu: test%assert
ObjexxFTK: CHECK
vegetables: assertThat

And for comparison, C++ testing frameworks:

doctest: CHECK
catch2: REQUIRE
Google Test: ASSERT_TRUE
Boost Test: BOOST_CHECK
cppunit: CPPUT_CHECK
Cute: ASSERT

I am thinking maybe it could be called assert_true, and later we can add others like assert_eq, etc.

pdebuyl commented 4 years ago

fortran_tester my_tester%assert_equal, my_tester%assert_positive, my_tester%assert_close (arguably, those only cover numerical tests. I only realize this now :-) ).

jvdp1 commented 4 years ago

Nice proposal. I would prefer the name assert_true (or maybe test_true) over test_condition. Anyway, it will be quite useful for debugging when multiple test are present.

urbanjost commented 4 years ago

It has been on my list to make a public version of testing and debugging code including unit testing so I found this topic particularly intriguing.

Since something related is starting here and I have reviewed most of the listed utilities before I thought I would just list some functionality unique to what I use as food for thought ...

Specifically focusing on tests comprising a unit test (versus parameter checking and other run-time testing functionality) a few differences between the referenced unit testers and what I like in my version are:

There is a number of global parameters that are set external to the 
main "assert" routine (I call mine UNIT_CHECK):

* A "level" value which is just an integer that all the tests can
  access that the user is free to use to determine what level of
  testing to perform and/or how verbose to be.

* As mentioned, an option to make a failure stop the test program
  or to optionally continue (some of the references have this,
  but as an option on "assert", not as a global mode).

* An option to skip producing messages when successful to make the output
  more succinct (some of the references have this, but as an option on
  "assert", not as a global mode).

In tests that I call that I want part of the run-time production routines I want options like this to be parameters on the call to the "assert" test routine, but in unit testing I want these to be run-time selectable global options for the entire set of tests.

Other things I found different

A name (almost always the procedure name being tested) is required as the first parameter in my "assert" routine.
One I am pretty sure is totally unique is an optional command name to call when starting and stopping a test set. This lets you arbitrarily enter test results into an SQLite database file, a CSV log, send mail messages, or whatever arbitrary action you want, versus just printing a simple log or exiting with a non-zero exit status -- you provide the command.
the "message field" is actually the "message fields" on mine, which is up to nine polymorphic variables so you can easily include numeric and logical values in the message, not just strings. Requires Modern Fortran, of course -- but a major convenience.

FOR MORE INFORMATION ON M_DEBUG

Some unpolished parts of what I had in mind for what I was going to release have already been extracted and made available in the General Purpose Fortran repository, primarily in the M_debug module, as I needed some basic testing functionallity just to start GPF. So I agree based on experience that the functionality of unit testing is needed pretty early on in stdlib.

I usually mix the unit tests in M_debug with generic routines similar to the dp_accdig() routine in M_math (to compare REAL values with a tolerance) and the ufpp(1) preprocessor $SYSTEM directive and the numdiff(1) program... so M_debug is intentionally simple and not complete by itself, but is more a framework than a full utility. I think that is a good model -- keep the unit test routines simple and use generic routines for comparing reals. These routines can be called in the expression field of "assert" instead of being provided as a set of routines. Powerful testing intrinsics like ANY() and ALL() already exist and I do not think need re-invented.

Note that if anyone builds the GPF repository it tries to run at least a few thousand tests at the end so there might not be a lot of documenation but there are a lot of examples, at least if you use gfortran on a *nix system (which is the default expected environment).

So lots of other things are possible, but normally you call very simple routines like ...

unit_check_start(... unit_check(name,expression,messages(s)...) unit_check(name,expression,messages(s)...) unit_check(name,expression,messages(s)...) unit_check_done(...

Generally for quick confidence tests to make sure things did not accidentally change I find (in practice) that I just let it procedure basic messages, resulting in a simple log like:

STARTED test_suite_m_sort
unit_check:       sort_shell           SUCCESS : sort string array, ascending
unit_check:       sort_shell           SUCCESS : sort string array, descending
unit_check:       sort_shell           SUCCESS : sort integer, ascending array
unit_check:       sort_shell           SUCCESS : sort integer, descending array
unit_check:       sort_shell           SUCCESS : sort real, ascending
unit_check:       sort_shell           SUCCESS : sort real, descending
unit_check:       sort_shell           SUCCESS : sort doubleprecision, ascending
unit_check:       sort_shell           SUCCESS : sort doubleprecision, descending
unit_check:       sort_shell           SUCCESS : sort complex by real component, ascending
unit_check:       sort_shell           SUCCESS : sort complex by real component, descending
unit_check:       sort_shell           SUCCESS : sort complex by imaginary component, ascending
unit_check:       sort_shell           SUCCESS : sort complex by imaginary component, descending
unit_check:       sort_shell           SUCCESS : sort complex array by magnitude, ascending
unit_check:       sort_shell           SUCCESS : sort complex array by magnitude, descending
unit_check:       sort_shell           SUCCESS : sort double complex by real component, ascending
unit_check:       sort_shell           SUCCESS : sort double complex by real component, descending
unit_check:       sort_shell           SUCCESS : sort double complex by imaginary component, ascending
unit_check:       sort_shell           SUCCESS : sort double complex by imaginary component, descending
unit_check:       sort_shell           SUCCESS : sort double complex by magnitude, ascending
unit_check:       sort_shell           SUCCESS : sort double complex by magnitude, descending
unit_check_done:  sort_shell           PASSED  : GOOD:20  BAD:0
                      :
                      :
unit_check:       cosd                 SUCCESS : cosd 1.00000000 1.00000000 value= 0.00000000 accuracy= 1.35631564E-19 asked for 5 digits
unit_check:       cosd                 SUCCESS : cosd 0.866025388 0.866025388 value= 30.0000000 accuracy= 1.35631564E-19 asked for 5 digits
unit_check:       cosd                 SUCCESS : cosd 0.707106769 0.707106769 value= 45.0000000 accuracy= 1.35631564E-19 asked for 5 digits
unit_check:       cosd                 SUCCESS : cosd 0.500000000 0.499999970 value= 60.0000000 accuracy= 1.35631564E-19 asked for 5 digits
unit_check:       cosd                 SUCCESS : cosd -3.48994955E-02 -3.48994620E-02 value= 92.0000000 accuracy= 6.01741600 asked for 5 digits

I have mine set up so a "test_suite" routine is searched for and automatically called at the end of the build that uses a bash shell that calls nm to find the test routine names and then builds a test program to call the tests, but that is a whole other discussion, I think.

milancurcic commented 4 years ago

Based on minimal feedback on this so far, there seems to be some support for this proposal, and no objections. I'd like to get more support here.

Regarding the naming, now there seems to be more favor for assert_true. However @nncarlson has warned earlier not to associate the word "assert" with testing. I'm fine with staying away from the word assert for testing, however library review by @certik shows that this word is quite widely used for testing (and such is my experience with say, Python). It seems to me that Neil's interpretation of assert is thus more narrow than broadly used.

Ultimately, we should take the road that most of community seems to agree on. We need more feedback here, on the naming and the whole of the proposal alike.

@zbeekman @jacobwilliams @ivan-pi @leonfoks @everythingfunctional @arjenmarkus @scivision mind chiming in (and please upvote or downvote the original post)?

certik commented 4 years ago

In C++ I have been using doctest and catch2, which use CHECK and REQUIRE respectively. Here is an example: https://gitlab.com/lfortran/lfortran/blob/57d3b8077d884f0ff3945ad3a86b2da920e4b6b3/src/lfortran/tests/test_parse.cpp#L1014, I think CHECK is perfectly readable also. The only issue is that in C++ it is a macro, that somehow analyses the expression and can extract the left hands side and right hand side and print useful info. In Fortran, if this was to be just a subroutine, then it would probably be called check.

everythingfunctional commented 4 years ago

I think run time checking and unit testing functionality should be kept completely separate. And in fact I've found that run time checking that stops execution actually makes unit testing incredibly difficult.

I'm not sure having what amounts to a fancy stop statement is a good idea. If I'm going to perform unit testing, I don't want any chance the program just stops. I want errors reported back to me. In fact I wrote a library to make this easier and more user friendly, so I could avoid having the data flow for errors be different from the normal data and control flow.

I would not support the addition of a run time checking facility because I think it encourages poor software design.

milancurcic commented 4 years ago

Thank you for your feedback @everythingfunctional. I'm confused about some terminology so I need to ask.

I think run time checking and unit testing functionality should be kept completely separate. And in fact I've found that run time checking that stops execution actually makes unit testing incredibly difficult.

By "run time checking", do you universally mean "run time checking that stops the program"? If yes, I agree. For my own testing (any kind really, unit, integration...), I use procedures very similar to test_condition proposed here, that don't stop the program, but only report on what succeeded or failed. In some cases I have an accumulator of successes or failures.

Would you then suggest removing the option to stop the program here altogether (warn=.true. here would then be removed as an argument and used as default)? This would cleanly separate it from the existing assert from stdlib_experimental_error.

I'm not sure having what amounts to a fancy stop statement is a good idea.

In case of a test_condition that stops, it's a convenience wrapper very much like optval is a wrapper for present.

If I'm going to perform unit testing, I don't want any chance the program just stops.

Me too, but think beyond unit testing :). Some programs (yes, even in production) need to stop gracefully on some conditions. Sure, we can use an if-branch and a stop statement, and test_condition makes it easier to do so. Do you think this is a bad idea? If yes, why?

I would not support the addition of a run time checking facility because I think it encourages poor software design.

Can you explain why? I don't see it. So many languages and libraries do it successfully IMO.

everythingfunctional commented 4 years ago

By run time checking in that statement I meant, anything that causes the procedure to do something beyond it's normal operation and sends that result anywhere other than back through the argument list or function result. Whether it's stopping the program, printing to screen, writing to an output file or launching the missiles, that is something that my tests (or any calling procedure) no longer has control over or access to. I don't want my tests to stop, and I don't want them cluttered up with output from the production code (or to inadvertently launch the missiles).

I don't think it's a good idea to add features to a language that enable a design/behavior we should be discouraging. Especially if it's something that could be accomplished with a library anyway. Maybe I'm being naive and idealistic though, because I could see a code base using a pattern like the following extensively, which would still enable deterministic testing.

if (.not. some_condition) then
    if (present(error)) then
        error = code
        return
    else
        print *, message
        stop code
    end if
end if

Maybe if you could get the feature to mimic that pattern and actually cause the procedure to return or stop depending on the presence of a variable and the value of a logical. I.e.

assert(some_condition, message, code, error, fatal)

So, if error is not present or fatal is present and true, then it prints the given message to std_err and stops the program, otherwise error becomes defined with the value of code and the procedure returns without printing anything. It would really be cool if code and error only had to be of the same type, and not some intrinsic type. That would be a reasonable compromise I might support (or at least not speak against).

milancurcic commented 4 years ago

@everythingfunctional Okay, great, I understand better now.

By run time checking in that statement I meant, anything that causes the procedure to do something beyond it's normal operation and sends that result anywhere other than back through the argument list or function result. Whether it's stopping the program, printing to screen, writing to an output file or launching the missiles, that is something that my tests (or any calling procedure) no longer has control over or access to. I don't want my tests to stop, and I don't want them cluttered up with output from the production code (or to inadvertently launch the missiles).

Great, I agree, this indeed is not what test_condition (or however we name it) is for! :) I wouldn't put this in a function, which I tend to write as pure anyway. However, I would very much like to use this in top-level programs to check the results of functions and subroutines.

I don't think it's a good idea to add features to a language that enable a design/behavior we should be discouraging. Especially if it's something that could be accomplished with a library anyway.

This is a proposal for stdlib, not the language (different repo). But otherwise, regarding discouraging certain style or behavior, it's not black and white. Any powerful feature can be used for good or harm.

Maybe if you could get the feature to mimic that pattern and actually cause the procedure to return or stop depending on the presence of a variable and the value of a logical. I.e.
assert(some_condition, message, code, error, fatal)

I think we're moving in a good direction. Considering the naming feedback from @certik (I like it also), and your feedback here, let's consider this API:

subroutine check(condition, msg, fatal, code)
  logical, intent(in) :: condition
  character(*), intent(in) :: msg
  logical, intent(in), optional :: fatal
  integer, intent(in), optional :: code

This subroutine emits msg to stderr if condition is .false.. Optionally, if fatal is .true. (default .false.), it stops the program with the exit code if provided. I like this even better than the original. I changed the order of fatal and code (code is only meaningful if fatal is .true.). msg is also now required because warn-only is now the default behavior.

Would you at least not object to this variant?

everythingfunctional commented 4 years ago

I have a tweak to make the usage alluded to a bit more convenient.

function check(condition, msg, fatal, code, stat)
    logical, intent(in) :: condition
    character(len=*), intent(in) :: msg
    logical, intent(in), optional :: fatal
    integer, intent(in), optional :: code
    integer, intent(out), optional :: stat
    logical :: check

so that one could write a line like

if (check(condition, msg, fatal, code, stat)) return

I would still lean towards not having the function emit msg at all if it doesn't stop the program. Where the conditions that cause the program to stop are:

fatal is provided and is .true.
stat is not provided
stat is provided but code is not (this is essentially invalid usage)

certik commented 4 years ago

One thing that I am still confused is what the intended audience of this check function is. Is it to write tests? If so, then how to you communicate to ctest that the test failed? Shouldn't fatal be .true. by default?

Also, a good testsuite framework (such as doctest in C++) collects the tests and shows all kinds of statistics. I think some of the Fortran test suites do the same. But this check function would be a very limited implementation of such a testsuite framework?

Wouldn't it make sense to simply leave such check functions to test frameworks?

everythingfunctional commented 4 years ago

The check function would not have anything to do with testing. As you recognized, it would have no way of communicating to a testing framework.

Additionally, I don't think test code should be mixed in with production code. Test code should call production code, not the other way around.

You may be right, that fatal should default to .true., which would make that bullet:

fatal is not provided or is .true.

The basic use case for this is to be able to write code that, in production stops execution, but under test does not. You could have a procedure with optional arguments that could be used to return error conditions under test, but would not be utilized in production code. I.e.

subroutine foo(arg, stat)
    integer, intent(in) :: arg
    integer, intent(out), optional :: stat

    ...
    if (check(condition, "Error occurred", .false., SPECIFIC_ERR_CODE, stat)) return
    ...

so that in production code you would just

call foo(arg)

but in test code you could

call foo(CAUSE_FAILURE, stat)
! Make some assertion about stat

I think this provides a reasonable way to improve code that simply uses stop statements, and enables and encourages making that code testable. But it does not directly talk to any sort of testing framework.

milancurcic commented 4 years ago

@certik Here are some use cases:

Validate user input: Print a warning if input is possibly problematic but the program can work, stop the program if it can't proceed with the given input. This is an ubiquitous pattern in programs with user interfaces.
In weather, ocean, and climate prediction (a virtually 100% Fortran industry), there are many properties of the model/program state that are potentially problematic for the user, but don't warrant stopping the program. For example, did humidity or rainfall go negative (yes, in models it happens)? The the velocity exceed the CFL stability limit? Warning without stopping is an ubiquitous pattern in fluid dynamics modeling.
Say your program works with data that dynamically grows in size depending on the contents. Your program can also get its resident set size (memory used by the process). In production, you'd want to warn when this number go over some limit (say, 50% of total memory), and you'd want to stop it before it's critical, otherwise it would affect the system (write memory to swap or worse). This is also a common pattern in item 2.

As you can see from the use cases, two essential elements of this subroutine are:

Warn the user on stderr;
Optionally stop the program.

So rather than testing, perhaps think about this as a simple warn subroutine, that can optionally also stop. I agree with Brad that this is not meant to make a testing framework. I think I understand Brad's use case, but I personally wouldn't use it like that.

I do object to this being a function and not a subroutine. The key part of this procedure is that it causes a side-effect (print to stderr and/or stop the program). That's pretty much all it does. One of the few things that I think Fortran got right over other languages is that it has a function and a subroutine. Functions can be used only in expressions. Subroutines can't be used in expressions. Expressions shouldn't cause side-effects. This is especially troublesome if the expression is a logical condition being tested (Brad's use case), or if its passed as an input argument to another procedure. If I understand correctly, integer, intent(out) :: stat was added exactly for causing an additional side-effect. I wouldn't object to this if we keep check a subroutine, although I don't personally care much for this functionality.

Regarding Brad's wish that check shouldn't emit a message if it doesn't stop the program, we could do this if we made the msg argument optional.

subroutine check(condition, msg, fatal, code)
  logical, intent(in) :: condition
  character(*), intent(in), optional :: msg
  logical, intent(in), optional :: fatal
  integer, intent(in), optional :: code

Now you can use it like this:

call check(x < 0, 'Warning: x is negative, solution may not converge') ! warn only

or:

call check(fatal_condition, fatal=.true.) ! stops the program without printing to stderr

What do you think?

arjenmarkus commented 4 years ago

The discussion is already pretty lengthy - I will read it later today.

Being of an older generation and using an optical device to enhance my vision, I have to ask: is there any easy way to get this thread on paper? I really prefer to read from that ancient medium ;).

everythingfunctional commented 4 years ago

If check is a subroutine, then my pattern becomes:

call check(condition, "Error occurred", .false., SPECIFIC_ERR_CODE, stat)
if (present(stat)) then
    if (stat /= 0) return
end if

I agree that in most cases side effects should be performed by subroutines, but in this case I think the general use case warrants it.

i understand the use cases you have shown are prevalent in many code bases, but I don't think they are a good design. If the following is inside a procedure that needs to be inside a loop you'll see this warning tons of times, without necessarily any context as to what's causing it, and you may only really want to see it once. Not to mention if I hit this condition in a test it may clutter up the output from my tests, and makes it much more difficult to test that this condition does emit the message.

call check(x < 0, 'Warning: x is negative, solution may not converge') ! warn only

If the following is in a procedure, now I can't test that procedure to see that inputs causing fatal_condition do in fact result in the appropriate error.

call check(fatal_condition, fatal=.true.) ! stops the program without printing anything

My thought is that we could use this as a way to demonstrate that the use case I've shown is a relatively straightforward change from patterns that are in common use today, but that makes code more testable. Once that's been demonstrated we can request a check statement be added to the language to make it possible to remove the if (...) return portion from around those statements, but still retain the exact same behavior.

I think this is a good use of an (experimental) standard library; demonstrate a use case for a feature that should be added to the language, with a clear upgrade path from existing code to the stdlib feature, and a clear upgrade path from the stdlib to the new langauge feature.

certik commented 4 years ago

@arjenmarkus unfortunately I can't quickly figure out how to print a GitHub issue. However, if this is something that would allow you to participate here, we can use GitHub API to get the Markdown contents of the issue and all the comments, and then we can convert to pdf that can be printed. It's some work, so I can't do it right away. Let me know if this would be a high priority for you.

arjenmarkus commented 4 years ago

@certik Thanks for the offer - it is most a matter of convenience: I could read the thread from paper while commuting. Since there does not seem to be a convenient way to do it, I will just read it from screen instead.

arjenmarkus commented 4 years ago

Having read the discussion, I would like to add the following:

"check" is a better name than "test_xxx" or "assert": it is more neutral. If we want to use it to check (oops, sorry) user-input and warn in a gentle way about possibly undesireable input or warn about states in the calculation that require attention, "check" conveys that intention, IMHO.
The idea of being able to control the behaviour at a central level is quite appealing, I think, and for that we could turn it into a method of some customisable object:

call checker%set_handler( stop_on_error ) ... call checker%check( x > 0, "Variable x should be positive" )

stop_on_error is then a routine that gets invoked if an error occurs (with some prescribed interface) - an idea I adopted from @urbanjost. And sundry other methods that control the behaviour globally. It is not even necessary to have only a single "checker" object, so that you can easily customise different parts of the program in different ways.

Using an object in this way, rather than a single routine or a set of routines, is customary in languages like C++ and Java where object-orientation is the preferred programming model.

The thing we should avoid is over-designing. It is easy to make a collection of routines (or methods) that allow us to change the behaviour in a myriad of ways, but let us stick to those aspects that are evidently useful at first. Extensions can be made in a later stage. Having said that, I know it is hard to answer the question "what is evidently useful?" ;).

certik commented 4 years ago

To be honest I am also struggling to understand the motivation for this. I understand the motivation for Debug time assert, which is #72. I also understand the motivation for a testing framework, which I feel might be beyond the scope of stdlib (yes, we have to write tests in stdlib somehow, for now we simply call error_stop, later on we can switch to some more sophisticated testing framework), either way that's a separate issue also.

That leaves the use cases 1., 2., and 3. from https://github.com/fortran-lang/stdlib/issues/121#issuecomment-580258256. Of those I usually just write an if statement by hand, and if an error occurs, then I call error_stop(), which is already in stdlib.

So I personally do not see the motivation for this yet, but if others find this useful, I am not against.

milancurcic commented 4 years ago

That leaves the use cases 1., 2., and 3. from #121 (comment). Of those I usually just write an if statement by hand, and if an error occurs, then I call error_stop(), which is already in stdlib.

Exactly, this is the crux. It's a convenience wrapper over if / then / write(stderr, *) or if / then / error_stop(), nothing more.

So I personally do not see the motivation for this yet

I can understand this. However, how do you motivate the need for the current implementation of assert in stdlib_experimental_error? It's literally this:

if (.not. condition) call error_stop("Assert failed.", code)

I'd argue that check is more valuable than current assert because it abstracts away more boilerplate than assert does.

What motivated me to propose this is the idea that stdlib would also provide convenience routines like this that simplify repetitive boilerplate. Do you think that stdlib should't provide utilities like this?

certik commented 4 years ago

I see. Yes, the assert was added so that we can write tests today. Then we realized that it really should be split into #72 and into a test framework.

If you see it as a better "assert", then I am for it. Essentially something that we can use today to write tests. And if we want more, then we would switch to a full blown test framework. Is that your idea?

milancurcic commented 4 years ago

This started as a backward compatible improvement to assert in #116. The winds of community feedback brought me here. In #116, @nncarlson stressed (reasonably so) that assert should stay minimal in scope.

If current implementation of assert is indeed to eventually retire in favor of a preprocessor macro, then I don't think it hurts to "upgrade" the current implementation of assert to a more appropriately named check, with two benefits:

People like @certik who want to do ad-hoc testing with it can do so without a framework (or just in combination with ctest, like we do now);
People like me who want to simply warn their users with a convenience subroutine can do so as well.

I do think @everythingfunctional ideas brought up here are important and we should discuss them. They do seem outside of scope of this proposal and seem to be more akin to #95.

certik commented 4 years ago

If the goal is to retire our current assert and replace it with check as discussed here, then I am for it. I think that's a good idea.

milancurcic commented 4 years ago

Let's pick this discussion up. It seems like there is satisfactory level of interest to retire assert in favor of check, from developers (myself @ivan-pi @certik @jvdp1). This would make stdlib development easier according to #136 and some earlier PRs that ran into the same issue (difficult to identify which assert failed).

Let's look at the API again:

subroutine check(condition, msg, code, warn)
  logical, intent(in) :: condition
  character(*), intent(in), optional :: msg
  integer, intent(in), optional :: code
  logical, intent(in), optional :: warn

This API is compatible with current assert, meaning you could rename all assert calls to check and the code would build and run with the same behavior. In addition, people like me who want this to warn only in some scenarios could do it.

Implemenation could be incremental:

Implement check as described;
Replace current assert calls with check;
Add informative messages to where check is used in current stdlib tests;
(optional) remove assert from stdlib_experimental_error.

Are there any objections to this API before I proceed with a PR?

jvdp1 commented 4 years ago

It sounds good for me. To avoid issues, it would maybe good to first merge the remaining PR, if there are ok.

Also, I would suggest to perform your steps 1, 2, and 4 in a same PR. I am not sure it is a good idea to have both assert and check in the same time in stdlib. This may allow people to still use assert while check is implemented, which I guess we would like to avoid that.

ivan-pi commented 4 years ago

Just to be sure, the default behavior is warn = .false., so that a false condition will halt execution?

I think it is sensible to not have assert and check in the library at the same time.

After this PR I suppose the developers who wrote the modules should perform step 3, and add some informative messages to the current tests? Or should we wait until we have a more complete solution for testing?

milancurcic commented 4 years ago

Just to be sure, the default behavior is warn = .false., so that a false condition will halt execution?

Yes, I think so. It would preserve the behavior of assert, and make its intent a bit different from a testing framework where you want your program to fail tests and keep running.

After this PR I suppose the developers who wrote the modules should perform step 3, and add some informative messages to the current tests?

This is quite easy and I don't mind doing it. We don't have that many modules and functions. All 4 steps above can be in a single PR.

Or should we wait until we have a more complete solution for testing?

No. We're already hitting problems in development with current assert. I think we should do an easy fix now and carefully design anything more complete, if desired at all.

jvdp1 commented 4 years ago

Is there any consensus/progress in this issue? Could I be of any help? Such a subroutine could have been useful today ;) @milancurcic

milancurcic commented 4 years ago

I'm glad you asked! I meant to tackle this next, sometime this coming week. I have a new baby and today is the last day of my 3-week parental leave, and thus lower activity on projects.

If you're eager to move it forward, please! Otherwise, I'd tackle it in the next few days.

I think this is the implementation we're looking for:

subroutine check(condition, msg, code, warn)
  logical, intent(in) :: condition
  character(*), intent(in), optional :: msg
  integer, intent(in), optional :: code
  logical, intent(in), optional :: warn
  character(*), parameter :: msg_default = 'Test failed.'
  if (.not. condition) then
    if (optval(warn, .false.)) then
      write(srderr,*) optval(msg, msg_default)
    else
      call error_stop(optval(msg, msg_default), code)
    end if
  end if
end subroutine check

Important points are:

Backward compatible with assert, that is, you can replace current assert calls with check and the behavior won't change; which implies:
- Stops the program by default, unless warn is .true.;
- Prints custom message only if msg is provided

How does that sound?

everythingfunctional commented 4 years ago

I still don't understand how this provides functionality different from the current assert. And, if it's backwards compatible with the current assert, then why call it something different?

I would like to see the following implementation

function check(condition, code, msg, fatal, stat)
    logical, intent(in) :: condition
    integer, intent(in) :: code
    character(len=*), intent(in), optional :: msg
    logical, intent(in), optional :: fatal
    integer, intent(out), optional :: stat
    logical :: check

    character(len=*), parameter :: DEFAULT_MESSAGE = 'Check Failed.'

    if (.not. condition) then
        if (optval(fatal, .false.)) then
            call error_stop(optval(msg, DEFAULT_MESSAGE), code)
        else
            if (present(stat)) then
                stat = code
                check = .true.
            else
                call error_stop(optval(msg, DEFAULT_MESSAGE), code)
            end if
        end if
    else
        check = .false.
    end if
end function check

I think @milancurcic use case could be accomplished by simply adding an optional warn argument to assert. Then we can just leave both in the std_lib, as we appear to have sufficiently distinct use cases.

milancurcic commented 4 years ago

@everythingfunctional #116 shows how we got to here.

milancurcic commented 4 years ago

I like Brad's proposed implementation also, if either we:

remove integer, intent(out), optional :: stat. Functions shouldn't do this -- it would be very difficult for a casual reader to notice that stat is being updated in an expression as a side effect;
make it a subroutine

To my eyes, Brad's check is closer to my check than my check is to assert. Plus, there have been (I think) convincing arguments to either leave assert alone or to drop it. I'm fine with either options.

As for why make check backward compatible with assert, I think the choice is pragmatic: 1) we can easily replace current calls to assert with check and be done with it; 2) people who still want existing assert functionality can have it.

I am not opposed to having multiple functions and subroutines with somewhat overlapping but different capabilities.

milancurcic commented 4 years ago

@jvdp1 Let's keep this at the drawing board until we have a majority agreement.

everythingfunctional commented 4 years ago

@everythingfunctional #116 shows how we got to here.

Ok, I think I see what we've got now. Multiple uses cases/ideas trying to get in on one feature. Here's what I think the uses cases are:

Static/compile time asserts to check against programmer error
Run time asserts to check against programmer error. Optimized away for anything but debug compilation
Run time checking of user inputs
Run time checking of intermediate results
Checking in the context of testing

I think checking in the context of testing should not be in the scope a the standard library. That is the responsibility of a testing library/framework. Besides, if I can tell which testing framework you're using by looking at your production code, then you're doing it wrong.

I don't think static/compile time checking would fall under the purview of the standard library either. It would probably need to be implemented by some sort of macro/preprocessor anyway.

In order for something to be optimized away under different compiler options, it would need to be a language feature (or maybe make use of macros or preprocessors), which would again put it outside the purview of the standard library.

That leaves run time checking of user inputs and run time checking of intermediate results. I think the solution to these problems needs to be designed with respect to the kind of usage patterns we would like to see encouraged, and possibly discouraged, not simply the patterns currently in common usage.

Sometimes it's better to remove capabilities from a language than it is to add new ones. Fortran's history is littered with bad design decisions. Like computed goto and alternate entry points. With great power comes great spaghetti code.

milancurcic commented 4 years ago

Thanks for the summary of use cases. I agree that the problem is exactly "Multiple uses cases/ideas trying to get in on one feature".

I think checking in the context of testing should not be in the scope a the standard library. That is the responsibility of a testing library/framework.

I don't see it as black and white. Recent and popular languages like Python and Go do it successfully. I bet many others too. A small project will often roll its own testing functions rather than relying on a heavy testing framework.

Though I tend to agree that a general and powerful testing framework may be out of scope for stdlib, you still need a way to test stdlib functions internally. Adding an external framework as a dependency is not feasible (not even vegetables, my favorite of the bunch), and current assert is not adequate. How would you do it?

We can safely ignore here the preprocessor macro and language developments.

As you say, let's focus on run time checking of user inputs and run time checking of intermediate results.

jvdp1 commented 4 years ago

I have a new baby and today is the last day of my 3-week parental leave, and thus lower activity on projects.

Congratulations @milancurcic !

everythingfunctional commented 4 years ago

My point is that stdlib functions shouldn't be tested internally. A testing framework should be used to test the stdlib from the outside. That way a user of the stdlib doesn't have to rely on the testing code.

I understand that some languages make use of doctests and inline tests, but they make use of capabilities of those languages that Fortran doesn't have. Introspection and standardized docstrings in the case of Python, and fully AST aware macros and a builtin panic function in the case of Rust. I'm also still not convinced the way they do it is actually a good idea. I'm still of the opinion that the tests should not be mixed in with the production code, as then you require your users to also depend on your testing framework.

jvdp1 commented 4 years ago

A couple of questions to try to understand pros/cons in this thread: @everythingfunctional what do you mean with:

That way a user of the stdlib doesn't have to rely on the testing code.

@milancurcic

Adding an external framework as a dependency is not feasible (not even vegetables, my favorite of the bunch),....

Why is it not feasible?

My main issue in this thread is that the use of multiple assert in a test file is painfull when a problem occurs (e.g., a too strict threshold when moving to another compiler). I would be happy with both approaches (internal or external testing), if they facilitate the development of stdlib, and do not rely on the current assert (that does not seem appropriate for the stdlib testing approach).

milancurcic commented 4 years ago

@everythingfunctional

My point is that stdlib functions shouldn't be tested internally. A testing framework should be used to test the stdlib from the outside.

Okay, your point is now clear to me. However, I still don't understand why (they shouldn't be tested internally). Are there practical reasons or is it more of a dogma?

That way a user of the stdlib doesn't have to rely on the testing code.

I don't think this is an answer to my question above because if you take care of testing internally, then this problem goes away.

Moreover, I want as many users to run stdlib tests and as seamlessly as possible. This will help us discover bugs and cross-platform issues. As a user myself, I want to run tests for most libraries I use, and I don't want to have to install Perl, Ruby, or Haskell to do so.

milancurcic commented 4 years ago

@jvdp1

Why is it not feasible?

I think it would add too much friction to development (it would for me), and it would discourage users from running tests.

My main issue in this thread is that the use of multiple assert in a test file is painfull when a problem occurs (e.g., a too strict threshold when moving to another compiler).

Same. There's an easy solution that works, now.

@everythingfunctional Recall also that we are now working only with experimental API, where we're supposed to try out and play with various approaches, and drop ones that don't work as we go. Nothing is set in stone here. This kind of analysis paralysis prevents work from getting done.

Perfect is the killer of good enough.

everythingfunctional commented 4 years ago

I still don't understand why (they shouldn't be tested internally). Are there practical reasons or is it more of a dogma?

With Fortran, if your tests are inside your procedures, they'll be run every time a user calls that procedure. That adds overhead to your library.

I don't think this is an answer to my question above because if you take care of testing internally, then this problem goes away.

I don't think the problem goes away, I just think it's no longer apparent to the user.

Moreover, I want as many users to run stdlib tests and as seamlessly as possible. This will help us discover bugs and cross-platform issues.

I'm not sure I understand this statement. How does having more people run the same tests help to find bugs? To me, the best way to find bugs is to have more people use the code in various ways. Otherwise your tests would have already found the bug.

I'm not saying the tests don't have value to outside users. In fact they are a great example of how your code is intended to be used, and done well describe the requirements the code is intended to fulfill. But your users should be able to see that your CI system is running the tests and trust that they pass. They shouldn't really need to run them themselves.

jvdp1 commented 4 years ago

@milancurcic I understand your point of view.

An extension of the current assert() by combining @milancurcic and @everythingfunctional ideas proposed in this thread could be already useful for the experimental development.

Meanwhile, a testing framework could be thought/set up for when we will move procedures from experimental part to the non-experimental part. Of course, this will not encourage the users to run tests.

jvdp1 commented 4 years ago

With Fortran, if your tests are inside your procedures, they'll be run every time a user calls that procedure. That adds overhead to your library.

I agree with that. However, I think we discussed different things.

Testing are outside the procedures, but well inside the directory/project of stdlib. E.g., now we test the mean function in a test file that contains multiple calls to assert and mean like:

call assert( (mean(x) - 1) < sptol )

As implemented now, it is a bit as it is in BLAS/LAPACK: the project includes tests, that the the user may or may not run. There are no tests inside proedures of BLAS/LAPACK (except for validation of the values passed to procedures of course).

Do I miss something?

milancurcic commented 4 years ago

With Fortran, if your tests are inside your procedures, they'll be run every time a user calls that procedure. That adds overhead to your library.

Okay, I agree and I didn't have that in mind at all. I think of tests as separate programs that call library functions, and these programs may or may not ship alongside the library. What we're doing now with assert is a rudimentary version of that. The only things I would change about current stdlib testing is to allow showing a meaningful message (developer knows what failed), and failures don't stop the program (developer can see multiple failure by running the suite once). At this time I see no need to change or add anything else.

I'm not sure I understand this statement. How does having more people run the same tests help to find bugs? To me, the best way to find bugs is to have more people use the code in various ways. Otherwise your tests would have already found the bug.

You're right, it doesn't really help if a user runs tests in the same environment as developers, and using the code in various ways is indeed more beneficial. However the output of tests will not always be the same across compilers, compiler versions, and operating systems.

everythingfunctional commented 4 years ago

What we're doing now with assert is a rudimentary version of that. The only things I would change about current stdlib testing is to allow showing a meaningful message (developer knows what failed), and failures don't stop the program (developer can see multiple failure by running the suite once).

Ok, so you're using assert as a rudimentary testing framework, but it wasn't really meant for that, and so it's not quite serving that purpose very effectively. A testing framework is intended to make writing those "separate programs that call library functions" easier and more useful for debugging.

certik commented 4 years ago

I wrote the current assert in stdlib and in my codes I use it both as a testing framework and as Debug time testing (that unfortunately also gets executed in Release mode). In the short run it allows me (us) to get the work done and move on.

In the long run, I agree it's not a good solution for either use case.

For testing, we should use a testing framework that is integrated with fpm, so that fpm test will execute the tests, just like cargo test does, and if a test fails, report which one failed, and what the LHS and RHS values are etc.

For Debug time tests, probably the only way currently in Fortran is to use a macro (until https://github.com/j3-fortran/fortran_proposals/issues/70 is accepted). However, I think this macro could be maintained in stdlib, as it would be very useful.

certik commented 4 years ago

Per my previous comment, here is my proposal:

1) get rid of the assert function, and introduce an ASSERT macro. Use that for Debug time checking of array bounds and other conditions that our code assumes, that we do not want to be checking in Release mode for performance reasons. See #72 for that.

2) Either depend on a testing framework (see #162), or create our own "mini testing framework", and port our tests to use that

3) There is a third use case, where you want to be checking conditions in Release mode also. An example of that is this: https://github.com/certik/fortran-utils/blob/b43bd24cd421509a5bc6d3b9c3eeae8ce856ed88/src/linalg.f90#L110. I am currently undecided whether this use case is different from 1), i.e., whether this test should still be executed in Release mode. But if it should, then that's a third use case.

milancurcic commented 4 years ago

Thank you @certik. I like the long term plan and also support introducing ASSERT macro.

In the meantime, do you object extending existing assert subroutine to ease development so we can at least keep moving forward? What do we do for internal testing before we have ASSERT macro and fpm test?

I will take some time to tally up current support/objections and elicit further support for this.

@everythingfunctional Can you please open an issue to propose an external testing framework for stdlib testing?

everythingfunctional commented 4 years ago

sure thing

fortran-lang / stdlib