Closed milancurcic closed 3 years ago
I kind of like the name assert
for writing tests. Obviously we need to distinguish it from the Debug assert
macro. Rust has assert
(always executed in both Debug and Release modes) and debug_assert
(only in Debug mode): https://doc.rust-lang.org/std/macro.assert.html.
Also this is probably going to be part of a testing framework. So we should look at what other Fortran testing frameworks use:
assert_true
assertTrue
assert_true
assert
test%assert
CHECK
assertThat
And for comparison, C++ testing frameworks:
CHECK
REQUIRE
ASSERT_TRUE
BOOST_CHECK
CPPUT_CHECK
ASSERT
I am thinking maybe it could be called assert_true
, and later we can add others like assert_eq
, etc.
my_tester%assert_equal
, my_tester%assert_positive
, my_tester%assert_close
(arguably, those only cover numerical tests. I only realize this now :-) ).Nice proposal. I would prefer the name assert_true
(or maybe test_true
) over test_condition
. Anyway, it will be quite useful for debugging when multiple test are present.
It has been on my list to make a public version of testing and debugging code including unit testing so I found this topic particularly intriguing.
Since something related is starting here and I have reviewed most of the listed utilities before I thought I would just list some functionality unique to what I use as food for thought ...
Specifically focusing on tests comprising a unit test (versus parameter checking and other run-time testing functionality) a few differences between the referenced unit testers and what I like in my version are:
There is a number of global parameters that are set external to the
main "assert" routine (I call mine UNIT_CHECK):
* A "level" value which is just an integer that all the tests can
access that the user is free to use to determine what level of
testing to perform and/or how verbose to be.
* As mentioned, an option to make a failure stop the test program
or to optionally continue (some of the references have this,
but as an option on "assert", not as a global mode).
* An option to skip producing messages when successful to make the output
more succinct (some of the references have this, but as an option on
"assert", not as a global mode).
In tests that I call that I want part of the run-time production routines I want options like this to be parameters on the call to the "assert" test routine, but in unit testing I want these to be run-time selectable global options for the entire set of tests.
Other things I found different
A name (almost always the procedure name being tested) is required as the first parameter in my "assert" routine.
One I am pretty sure is totally unique is an optional command name to call when starting and stopping a test set. This lets you arbitrarily enter test results into an SQLite database file, a CSV log, send mail messages, or whatever arbitrary action you want, versus just printing a simple log or exiting with a non-zero exit status -- you provide the command.
the "message field" is actually the "message fields" on mine, which is up to nine polymorphic variables so you can easily include numeric and logical values in the message, not just strings. Requires Modern Fortran, of course -- but a major convenience.
Some unpolished parts of what I had in mind for what I was going to release have already been extracted and made available in the General Purpose Fortran repository, primarily in the M_debug module, as I needed some basic testing functionallity just to start GPF. So I agree based on experience that the functionality of unit testing is needed pretty early on in stdlib.
I usually mix the unit tests in M_debug with generic routines similar to the dp_accdig() routine in M_math (to compare REAL values with a tolerance) and the ufpp(1) preprocessor $SYSTEM directive and the numdiff(1) program... so M_debug is intentionally simple and not complete by itself, but is more a framework than a full utility. I think that is a good model -- keep the unit test routines simple and use generic routines for comparing reals. These routines can be called in the expression field of "assert" instead of being provided as a set of routines. Powerful testing intrinsics like ANY() and ALL() already exist and I do not think need re-invented.
Note that if anyone builds the GPF repository it tries to run at least a few thousand tests at the end so there might not be a lot of documenation but there are a lot of examples, at least if you use gfortran on a *nix system (which is the default expected environment).
So lots of other things are possible, but normally you call very simple routines like ...
unit_check_start(... unit_check(name,expression,messages(s)...) unit_check(name,expression,messages(s)...) unit_check(name,expression,messages(s)...) unit_check_done(...
Generally for quick confidence tests to make sure things did not accidentally change I find (in practice) that I just let it procedure basic messages, resulting in a simple log like:
STARTED test_suite_m_sort
unit_check: sort_shell SUCCESS : sort string array, ascending
unit_check: sort_shell SUCCESS : sort string array, descending
unit_check: sort_shell SUCCESS : sort integer, ascending array
unit_check: sort_shell SUCCESS : sort integer, descending array
unit_check: sort_shell SUCCESS : sort real, ascending
unit_check: sort_shell SUCCESS : sort real, descending
unit_check: sort_shell SUCCESS : sort doubleprecision, ascending
unit_check: sort_shell SUCCESS : sort doubleprecision, descending
unit_check: sort_shell SUCCESS : sort complex by real component, ascending
unit_check: sort_shell SUCCESS : sort complex by real component, descending
unit_check: sort_shell SUCCESS : sort complex by imaginary component, ascending
unit_check: sort_shell SUCCESS : sort complex by imaginary component, descending
unit_check: sort_shell SUCCESS : sort complex array by magnitude, ascending
unit_check: sort_shell SUCCESS : sort complex array by magnitude, descending
unit_check: sort_shell SUCCESS : sort double complex by real component, ascending
unit_check: sort_shell SUCCESS : sort double complex by real component, descending
unit_check: sort_shell SUCCESS : sort double complex by imaginary component, ascending
unit_check: sort_shell SUCCESS : sort double complex by imaginary component, descending
unit_check: sort_shell SUCCESS : sort double complex by magnitude, ascending
unit_check: sort_shell SUCCESS : sort double complex by magnitude, descending
unit_check_done: sort_shell PASSED : GOOD:20 BAD:0
:
:
unit_check: cosd SUCCESS : cosd 1.00000000 1.00000000 value= 0.00000000 accuracy= 1.35631564E-19 asked for 5 digits
unit_check: cosd SUCCESS : cosd 0.866025388 0.866025388 value= 30.0000000 accuracy= 1.35631564E-19 asked for 5 digits
unit_check: cosd SUCCESS : cosd 0.707106769 0.707106769 value= 45.0000000 accuracy= 1.35631564E-19 asked for 5 digits
unit_check: cosd SUCCESS : cosd 0.500000000 0.499999970 value= 60.0000000 accuracy= 1.35631564E-19 asked for 5 digits
unit_check: cosd SUCCESS : cosd -3.48994955E-02 -3.48994620E-02 value= 92.0000000 accuracy= 6.01741600 asked for 5 digits
I have mine set up so a "test_suite" routine is searched for and automatically called at the end of the build that uses a bash shell that calls nm to find the test routine names and then builds a test program to call the tests, but that is a whole other discussion, I think.
Based on minimal feedback on this so far, there seems to be some support for this proposal, and no objections. I'd like to get more support here.
Regarding the naming, now there seems to be more favor for assert_true
. However @nncarlson has warned earlier not to associate the word "assert" with testing. I'm fine with staying away from the word assert for testing, however library review by @certik shows that this word is quite widely used for testing (and such is my experience with say, Python). It seems to me that Neil's interpretation of assert is thus more narrow than broadly used.
Ultimately, we should take the road that most of community seems to agree on. We need more feedback here, on the naming and the whole of the proposal alike.
@zbeekman @jacobwilliams @ivan-pi @leonfoks @everythingfunctional @arjenmarkus @scivision mind chiming in (and please upvote or downvote the original post)?
In C++ I have been using doctest
and catch2
, which use CHECK
and REQUIRE
respectively. Here is an example: https://gitlab.com/lfortran/lfortran/blob/57d3b8077d884f0ff3945ad3a86b2da920e4b6b3/src/lfortran/tests/test_parse.cpp#L1014, I think CHECK
is perfectly readable also. The only issue is that in C++ it is a macro, that somehow analyses the expression and can extract the left hands side and right hand side and print useful info. In Fortran, if this was to be just a subroutine, then it would probably be called check
.
I think run time checking and unit testing functionality should be kept completely separate. And in fact I've found that run time checking that stops execution actually makes unit testing incredibly difficult.
I'm not sure having what amounts to a fancy stop
statement is a good idea. If I'm going to perform unit testing, I don't want any chance the program just stops. I want errors reported back to me. In fact I wrote a library to make this easier and more user friendly, so I could avoid having the data flow for errors be different from the normal data and control flow.
I would not support the addition of a run time checking facility because I think it encourages poor software design.
Thank you for your feedback @everythingfunctional. I'm confused about some terminology so I need to ask.
I think run time checking and unit testing functionality should be kept completely separate. And in fact I've found that run time checking that stops execution actually makes unit testing incredibly difficult.
By "run time checking", do you universally mean "run time checking that stops the program"? If yes, I agree. For my own testing (any kind really, unit, integration...), I use procedures very similar to test_condition
proposed here, that don't stop the program, but only report on what succeeded or failed. In some cases I have an accumulator of successes or failures.
Would you then suggest removing the option to stop the program here altogether (warn=.true.
here would then be removed as an argument and used as default)? This would cleanly separate it from the existing assert
from stdlib_experimental_error
.
I'm not sure having what amounts to a fancy stop statement is a good idea.
In case of a test_condition
that stops, it's a convenience wrapper very much like optval
is a wrapper for present
.
If I'm going to perform unit testing, I don't want any chance the program just stops.
Me too, but think beyond unit testing :). Some programs (yes, even in production) need to stop gracefully on some conditions. Sure, we can use an if-branch and a stop
statement, and test_condition
makes it easier to do so. Do you think this is a bad idea? If yes, why?
I would not support the addition of a run time checking facility because I think it encourages poor software design.
Can you explain why? I don't see it. So many languages and libraries do it successfully IMO.
By run time checking in that statement I meant, anything that causes the procedure to do something beyond it's normal operation and sends that result anywhere other than back through the argument list or function result. Whether it's stopping the program, printing to screen, writing to an output file or launching the missiles, that is something that my tests (or any calling procedure) no longer has control over or access to. I don't want my tests to stop, and I don't want them cluttered up with output from the production code (or to inadvertently launch the missiles).
I don't think it's a good idea to add features to a language that enable a design/behavior we should be discouraging. Especially if it's something that could be accomplished with a library anyway. Maybe I'm being naive and idealistic though, because I could see a code base using a pattern like the following extensively, which would still enable deterministic testing.
if (.not. some_condition) then
if (present(error)) then
error = code
return
else
print *, message
stop code
end if
end if
Maybe if you could get the feature to mimic that pattern and actually cause the procedure to return or stop depending on the presence of a variable and the value of a logical. I.e.
assert(some_condition, message, code, error, fatal)
So, if error
is not present or fatal
is present and true, then it prints the given message to std_err
and stops the program, otherwise error
becomes defined with the value of code
and the procedure returns without printing anything. It would really be cool if code and error only had to be of the same type, and not some intrinsic type. That would be a reasonable compromise I might support (or at least not speak against).
@everythingfunctional Okay, great, I understand better now.
By run time checking in that statement I meant, anything that causes the procedure to do something beyond it's normal operation and sends that result anywhere other than back through the argument list or function result. Whether it's stopping the program, printing to screen, writing to an output file or launching the missiles, that is something that my tests (or any calling procedure) no longer has control over or access to. I don't want my tests to stop, and I don't want them cluttered up with output from the production code (or to inadvertently launch the missiles).
Great, I agree, this indeed is not what test_condition
(or however we name it) is for! :) I wouldn't put this in a function, which I tend to write as pure
anyway. However, I would very much like to use this in top-level programs to check the results of functions and subroutines.
I don't think it's a good idea to add features to a language that enable a design/behavior we should be discouraging. Especially if it's something that could be accomplished with a library anyway.
This is a proposal for stdlib, not the language (different repo). But otherwise, regarding discouraging certain style or behavior, it's not black and white. Any powerful feature can be used for good or harm.
Maybe if you could get the feature to mimic that pattern and actually cause the procedure to return or stop depending on the presence of a variable and the value of a logical. I.e.
assert(some_condition, message, code, error, fatal)
I think we're moving in a good direction. Considering the naming feedback from @certik (I like it also), and your feedback here, let's consider this API:
subroutine check(condition, msg, fatal, code)
logical, intent(in) :: condition
character(*), intent(in) :: msg
logical, intent(in), optional :: fatal
integer, intent(in), optional :: code
This subroutine emits msg
to stderr if condition
is .false.
. Optionally, if fatal
is .true.
(default .false.
), it stops the program with the exit code if provided. I like this even better than the original.
I changed the order of fatal
and code
(code
is only meaningful if fatal
is .true.
). msg
is also now required because warn-only is now the default behavior.
Would you at least not object to this variant?
I have a tweak to make the usage alluded to a bit more convenient.
function check(condition, msg, fatal, code, stat)
logical, intent(in) :: condition
character(len=*), intent(in) :: msg
logical, intent(in), optional :: fatal
integer, intent(in), optional :: code
integer, intent(out), optional :: stat
logical :: check
so that one could write a line like
if (check(condition, msg, fatal, code, stat)) return
I would still lean towards not having the function emit msg
at all if it doesn't stop the program. Where the conditions that cause the program to stop are:
fatal
is provided and is .true.
stat
is not providedstat
is provided but code
is not (this is essentially invalid usage)One thing that I am still confused is what the intended audience of this check
function is. Is it to write tests? If so, then how to you communicate to ctest
that the test failed? Shouldn't fatal
be .true.
by default?
Also, a good testsuite framework (such as doctest
in C++) collects the tests and shows all kinds of statistics. I think some of the Fortran test suites do the same. But this check
function would be a very limited implementation of such a testsuite framework?
Wouldn't it make sense to simply leave such check
functions to test frameworks?
The check
function would not have anything to do with testing. As you recognized, it would have no way of communicating to a testing framework.
Additionally, I don't think test code should be mixed in with production code. Test code should call production code, not the other way around.
You may be right, that fatal
should default to .true.
, which would make that bullet:
fatal
is not provided or is .true.
The basic use case for this is to be able to write code that, in production stops execution, but under test does not. You could have a procedure with optional arguments that could be used to return error conditions under test, but would not be utilized in production code. I.e.
subroutine foo(arg, stat)
integer, intent(in) :: arg
integer, intent(out), optional :: stat
...
if (check(condition, "Error occurred", .false., SPECIFIC_ERR_CODE, stat)) return
...
so that in production code you would just
call foo(arg)
but in test code you could
call foo(CAUSE_FAILURE, stat)
! Make some assertion about stat
I think this provides a reasonable way to improve code that simply uses stop
statements, and enables and encourages making that code testable. But it does not directly talk to any sort of testing framework.
@certik Here are some use cases:
As you can see from the use cases, two essential elements of this subroutine are:
So rather than testing, perhaps think about this as a simple warn subroutine, that can optionally also stop. I agree with Brad that this is not meant to make a testing framework. I think I understand Brad's use case, but I personally wouldn't use it like that.
I do object to this being a function and not a subroutine. The key part of this procedure is that it causes a side-effect (print to stderr and/or stop the program). That's pretty much all it does. One of the few things that I think Fortran got right over other languages is that it has a function and a subroutine. Functions can be used only in expressions. Subroutines can't be used in expressions. Expressions shouldn't cause side-effects. This is especially troublesome if the expression is a logical condition being tested (Brad's use case), or if its passed as an input argument to another procedure. If I understand correctly, integer, intent(out) :: stat
was added exactly for causing an additional side-effect. I wouldn't object to this if we keep check
a subroutine, although I don't personally care much for this functionality.
Regarding Brad's wish that check
shouldn't emit a message if it doesn't stop the program, we could do this if we made the msg
argument optional.
subroutine check(condition, msg, fatal, code)
logical, intent(in) :: condition
character(*), intent(in), optional :: msg
logical, intent(in), optional :: fatal
integer, intent(in), optional :: code
Now you can use it like this:
call check(x < 0, 'Warning: x is negative, solution may not converge') ! warn only
or:
call check(fatal_condition, fatal=.true.) ! stops the program without printing to stderr
What do you think?
The discussion is already pretty lengthy - I will read it later today.
Being of an older generation and using an optical device to enhance my vision, I have to ask: is there any easy way to get this thread on paper? I really prefer to read from that ancient medium ;).
If check
is a subroutine, then my pattern becomes:
call check(condition, "Error occurred", .false., SPECIFIC_ERR_CODE, stat)
if (present(stat)) then
if (stat /= 0) return
end if
I agree that in most cases side effects should be performed by subroutines, but in this case I think the general use case warrants it.
i understand the use cases you have shown are prevalent in many code bases, but I don't think they are a good design. If the following is inside a procedure that needs to be inside a loop you'll see this warning tons of times, without necessarily any context as to what's causing it, and you may only really want to see it once. Not to mention if I hit this condition in a test it may clutter up the output from my tests, and makes it much more difficult to test that this condition does emit the message.
call check(x < 0, 'Warning: x is negative, solution may not converge') ! warn only
If the following is in a procedure, now I can't test that procedure to see that inputs causing fatal_condition
do in fact result in the appropriate error.
call check(fatal_condition, fatal=.true.) ! stops the program without printing anything
My thought is that we could use this as a way to demonstrate that the use case I've shown is a relatively straightforward change from patterns that are in common use today, but that makes code more testable. Once that's been demonstrated we can request a check
statement be added to the language to make it possible to remove the if (...) return
portion from around those statements, but still retain the exact same behavior.
I think this is a good use of an (experimental) standard library; demonstrate a use case for a feature that should be added to the language, with a clear upgrade path from existing code to the stdlib feature, and a clear upgrade path from the stdlib to the new langauge feature.
@arjenmarkus unfortunately I can't quickly figure out how to print a GitHub issue. However, if this is something that would allow you to participate here, we can use GitHub API to get the Markdown contents of the issue and all the comments, and then we can convert to pdf that can be printed. It's some work, so I can't do it right away. Let me know if this would be a high priority for you.
@certik Thanks for the offer - it is most a matter of convenience: I could read the thread from paper while commuting. Since there does not seem to be a convenient way to do it, I will just read it from screen instead.
Having read the discussion, I would like to add the following:
call checker%set_handler( stop_on_error )
...
call checker%check( x > 0, "Variable x should be positive" )
stop_on_error
is then a routine that gets invoked if an error occurs (with some prescribed interface) - an idea I adopted from @urbanjost. And sundry other methods that control the behaviour globally. It is not even necessary to have only a single "checker" object, so that you can easily customise different parts of the program in different ways.
Using an object in this way, rather than a single routine or a set of routines, is customary in languages like C++ and Java where object-orientation is the preferred programming model.
The thing we should avoid is over-designing. It is easy to make a collection of routines (or methods) that allow us to change the behaviour in a myriad of ways, but let us stick to those aspects that are evidently useful at first. Extensions can be made in a later stage. Having said that, I know it is hard to answer the question "what is evidently useful?" ;).
To be honest I am also struggling to understand the motivation for this. I understand the motivation for Debug time assert, which is #72. I also understand the motivation for a testing framework, which I feel might be beyond the scope of stdlib (yes, we have to write tests in stdlib somehow, for now we simply call error_stop
, later on we can switch to some more sophisticated testing framework), either way that's a separate issue also.
That leaves the use cases 1., 2., and 3. from https://github.com/fortran-lang/stdlib/issues/121#issuecomment-580258256. Of those I usually just write an if
statement by hand, and if an error occurs, then I call error_stop()
, which is already in stdlib
.
So I personally do not see the motivation for this yet, but if others find this useful, I am not against.
That leaves the use cases 1., 2., and 3. from #121 (comment). Of those I usually just write an if statement by hand, and if an error occurs, then I call error_stop(), which is already in stdlib.
Exactly, this is the crux. It's a convenience wrapper over if / then / write(stderr, *)
or if / then / error_stop()
, nothing more.
So I personally do not see the motivation for this yet
I can understand this. However, how do you motivate the need for the current implementation of assert
in stdlib_experimental_error
? It's literally this:
if (.not. condition) call error_stop("Assert failed.", code)
I'd argue that check
is more valuable than current assert
because it abstracts away more boilerplate than assert
does.
What motivated me to propose this is the idea that stdlib would also provide convenience routines like this that simplify repetitive boilerplate. Do you think that stdlib should't provide utilities like this?
I see. Yes, the assert
was added so that we can write tests today. Then we realized that it really should be split into #72 and into a test framework.
If you see it as a better "assert", then I am for it. Essentially something that we can use today to write tests. And if we want more, then we would switch to a full blown test framework. Is that your idea?
This started as a backward compatible improvement to assert
in #116. The winds of community feedback brought me here. In #116, @nncarlson stressed (reasonably so) that assert
should stay minimal in scope.
If current implementation of assert is indeed to eventually retire in favor of a preprocessor macro, then I don't think it hurts to "upgrade" the current implementation of assert
to a more appropriately named check
, with two benefits:
ctest
, like we do now);I do think @everythingfunctional ideas brought up here are important and we should discuss them. They do seem outside of scope of this proposal and seem to be more akin to #95.
If the goal is to retire our current assert
and replace it with check
as discussed here, then I am for it. I think that's a good idea.
Let's pick this discussion up. It seems like there is satisfactory level of interest to retire assert
in favor of check
, from developers (myself @ivan-pi @certik @jvdp1). This would make stdlib development easier according to #136 and some earlier PRs that ran into the same issue (difficult to identify which assert failed).
Let's look at the API again:
subroutine check(condition, msg, code, warn)
logical, intent(in) :: condition
character(*), intent(in), optional :: msg
integer, intent(in), optional :: code
logical, intent(in), optional :: warn
This API is compatible with current assert
, meaning you could rename all assert
calls to check
and the code would build and run with the same behavior. In addition, people like me who want this to warn only in some scenarios could do it.
Implemenation could be incremental:
check
as described;assert
calls with check
;check
is used in current stdlib tests;assert
from stdlib_experimental_error
.Are there any objections to this API before I proceed with a PR?
It sounds good for me. To avoid issues, it would maybe good to first merge the remaining PR, if there are ok.
Also, I would suggest to perform your steps 1, 2, and 4 in a same PR. I am not sure it is a good idea to have both assert
and check
in the same time in stdlib
. This may allow people to still use assert
while check
is implemented, which I guess we would like to avoid that.
Just to be sure, the default behavior is warn = .false.
, so that a false condition will halt execution?
I think it is sensible to not have assert
and check
in the library at the same time.
After this PR I suppose the developers who wrote the modules should perform step 3, and add some informative messages to the current tests? Or should we wait until we have a more complete solution for testing?
Just to be sure, the default behavior is warn = .false., so that a false condition will halt execution?
Yes, I think so. It would preserve the behavior of assert
, and make its intent a bit different from a testing framework where you want your program to fail tests and keep running.
After this PR I suppose the developers who wrote the modules should perform step 3, and add some informative messages to the current tests?
This is quite easy and I don't mind doing it. We don't have that many modules and functions. All 4 steps above can be in a single PR.
Or should we wait until we have a more complete solution for testing?
No. We're already hitting problems in development with current assert
. I think we should do an easy fix now and carefully design anything more complete, if desired at all.
Is there any consensus/progress in this issue? Could I be of any help? Such a subroutine could have been useful today ;) @milancurcic
I'm glad you asked! I meant to tackle this next, sometime this coming week. I have a new baby and today is the last day of my 3-week parental leave, and thus lower activity on projects.
If you're eager to move it forward, please! Otherwise, I'd tackle it in the next few days.
I think this is the implementation we're looking for:
subroutine check(condition, msg, code, warn)
logical, intent(in) :: condition
character(*), intent(in), optional :: msg
integer, intent(in), optional :: code
logical, intent(in), optional :: warn
character(*), parameter :: msg_default = 'Test failed.'
if (.not. condition) then
if (optval(warn, .false.)) then
write(srderr,*) optval(msg, msg_default)
else
call error_stop(optval(msg, msg_default), code)
end if
end if
end subroutine check
Important points are:
assert
, that is, you can replace current assert
calls with check
and the behavior won't change; which implies:
warn
is .true.
;msg
is providedHow does that sound?
I still don't understand how this provides functionality different from the current assert
. And, if it's backwards compatible with the current assert
, then why call it something different?
I would like to see the following implementation
function check(condition, code, msg, fatal, stat)
logical, intent(in) :: condition
integer, intent(in) :: code
character(len=*), intent(in), optional :: msg
logical, intent(in), optional :: fatal
integer, intent(out), optional :: stat
logical :: check
character(len=*), parameter :: DEFAULT_MESSAGE = 'Check Failed.'
if (.not. condition) then
if (optval(fatal, .false.)) then
call error_stop(optval(msg, DEFAULT_MESSAGE), code)
else
if (present(stat)) then
stat = code
check = .true.
else
call error_stop(optval(msg, DEFAULT_MESSAGE), code)
end if
end if
else
check = .false.
end if
end function check
I think @milancurcic use case could be accomplished by simply adding an optional warn
argument to assert
. Then we can just leave both in the std_lib, as we appear to have sufficiently distinct use cases.
@everythingfunctional #116 shows how we got to here.
I like Brad's proposed implementation also, if either we:
integer, intent(out), optional :: stat
. Functions shouldn't do this -- it would be very difficult for a casual reader to notice that stat
is being updated in an expression as a side effect;To my eyes, Brad's check
is closer to my check
than my check
is to assert
. Plus, there have been (I think) convincing arguments to either leave assert
alone or to drop it. I'm fine with either options.
As for why make check
backward compatible with assert
, I think the choice is pragmatic: 1) we can easily replace current calls to assert
with check
and be done with it; 2) people who still want existing assert
functionality can have it.
I am not opposed to having multiple functions and subroutines with somewhat overlapping but different capabilities.
@jvdp1 Let's keep this at the drawing board until we have a majority agreement.
@everythingfunctional #116 shows how we got to here.
Ok, I think I see what we've got now. Multiple uses cases/ideas trying to get in on one feature. Here's what I think the uses cases are:
I think checking in the context of testing should not be in the scope a the standard library. That is the responsibility of a testing library/framework. Besides, if I can tell which testing framework you're using by looking at your production code, then you're doing it wrong.
I don't think static/compile time checking would fall under the purview of the standard library either. It would probably need to be implemented by some sort of macro/preprocessor anyway.
In order for something to be optimized away under different compiler options, it would need to be a language feature (or maybe make use of macros or preprocessors), which would again put it outside the purview of the standard library.
That leaves run time checking of user inputs and run time checking of intermediate results. I think the solution to these problems needs to be designed with respect to the kind of usage patterns we would like to see encouraged, and possibly discouraged, not simply the patterns currently in common usage.
Sometimes it's better to remove capabilities from a language than it is to add new ones. Fortran's history is littered with bad design decisions. Like computed goto and alternate entry points. With great power comes great spaghetti code.
Thanks for the summary of use cases. I agree that the problem is exactly "Multiple uses cases/ideas trying to get in on one feature".
I think checking in the context of testing should not be in the scope a the standard library. That is the responsibility of a testing library/framework.
I don't see it as black and white. Recent and popular languages like Python and Go do it successfully. I bet many others too. A small project will often roll its own testing functions rather than relying on a heavy testing framework.
Though I tend to agree that a general and powerful testing framework may be out of scope for stdlib, you still need a way to test stdlib functions internally. Adding an external framework as a dependency is not feasible (not even vegetables, my favorite of the bunch), and current assert
is not adequate. How would you do it?
We can safely ignore here the preprocessor macro and language developments.
As you say, let's focus on run time checking of user inputs and run time checking of intermediate results.
I have a new baby and today is the last day of my 3-week parental leave, and thus lower activity on projects.
Congratulations @milancurcic !
My point is that stdlib functions shouldn't be tested internally. A testing framework should be used to test the stdlib from the outside. That way a user of the stdlib doesn't have to rely on the testing code.
I understand that some languages make use of doctests and inline tests, but they make use of capabilities of those languages that Fortran doesn't have. Introspection and standardized docstrings in the case of Python, and fully AST aware macros and a builtin panic
function in the case of Rust. I'm also still not convinced the way they do it is actually a good idea. I'm still of the opinion that the tests should not be mixed in with the production code, as then you require your users to also depend on your testing framework.
A couple of questions to try to understand pros/cons in this thread: @everythingfunctional what do you mean with:
That way a user of the stdlib doesn't have to rely on the testing code.
@milancurcic
Adding an external framework as a dependency is not feasible (not even vegetables, my favorite of the bunch),....
Why is it not feasible?
My main issue in this thread is that the use of multiple assert
in a test file is painfull when a problem occurs (e.g., a too strict threshold when moving to another compiler).
I would be happy with both approaches (internal or external testing), if they facilitate the development of stdlib
, and do not rely on the current assert
(that does not seem appropriate for the stdlib
testing approach).
@everythingfunctional
My point is that stdlib functions shouldn't be tested internally. A testing framework should be used to test the stdlib from the outside.
Okay, your point is now clear to me. However, I still don't understand why (they shouldn't be tested internally). Are there practical reasons or is it more of a dogma?
That way a user of the stdlib doesn't have to rely on the testing code.
I don't think this is an answer to my question above because if you take care of testing internally, then this problem goes away.
Moreover, I want as many users to run stdlib tests and as seamlessly as possible. This will help us discover bugs and cross-platform issues. As a user myself, I want to run tests for most libraries I use, and I don't want to have to install Perl, Ruby, or Haskell to do so.
@jvdp1
Why is it not feasible?
I think it would add too much friction to development (it would for me), and it would discourage users from running tests.
My main issue in this thread is that the use of multiple assert in a test file is painfull when a problem occurs (e.g., a too strict threshold when moving to another compiler).
Same. There's an easy solution that works, now.
@everythingfunctional Recall also that we are now working only with experimental API, where we're supposed to try out and play with various approaches, and drop ones that don't work as we go. Nothing is set in stone here. This kind of analysis paralysis prevents work from getting done.
Perfect is the killer of good enough.
I still don't understand why (they shouldn't be tested internally). Are there practical reasons or is it more of a dogma?
With Fortran, if your tests are inside your procedures, they'll be run every time a user calls that procedure. That adds overhead to your library.
I don't think this is an answer to my question above because if you take care of testing internally, then this problem goes away.
I don't think the problem goes away, I just think it's no longer apparent to the user.
Moreover, I want as many users to run stdlib tests and as seamlessly as possible. This will help us discover bugs and cross-platform issues.
I'm not sure I understand this statement. How does having more people run the same tests help to find bugs? To me, the best way to find bugs is to have more people use the code in various ways. Otherwise your tests would have already found the bug.
I'm not saying the tests don't have value to outside users. In fact they are a great example of how your code is intended to be used, and done well describe the requirements the code is intended to fulfill. But your users should be able to see that your CI system is running the tests and trust that they pass. They shouldn't really need to run them themselves.
@milancurcic I understand your point of view.
An extension of the current assert()
by combining @milancurcic and @everythingfunctional ideas proposed in this thread could be already useful for the experimental development.
Meanwhile, a testing framework could be thought/set up for when we will move procedures from experimental part to the non-experimental part. Of course, this will not encourage the users to run tests.
With Fortran, if your tests are inside your procedures, they'll be run every time a user calls that procedure. That adds overhead to your library.
I agree with that. However, I think we discussed different things.
Testing are outside the procedures, but well inside the directory/project of stdlib
. E.g., now we test the mean
function in a test file that contains multiple calls to assert
and mean
like:
call assert( (mean(x) - 1) < sptol )
As implemented now, it is a bit as it is in BLAS/LAPACK: the project includes tests, that the the user may or may not run. There are no tests inside proedures of BLAS/LAPACK (except for validation of the values passed to procedures of course).
Do I miss something?
With Fortran, if your tests are inside your procedures, they'll be run every time a user calls that procedure. That adds overhead to your library.
Okay, I agree and I didn't have that in mind at all. I think of tests as separate programs that call library functions, and these programs may or may not ship alongside the library. What we're doing now with assert
is a rudimentary version of that. The only things I would change about current stdlib testing is to allow showing a meaningful message (developer knows what failed), and failures don't stop the program (developer can see multiple failure by running the suite once). At this time I see no need to change or add anything else.
I'm not sure I understand this statement. How does having more people run the same tests help to find bugs? To me, the best way to find bugs is to have more people use the code in various ways. Otherwise your tests would have already found the bug.
You're right, it doesn't really help if a user runs tests in the same environment as developers, and using the code in various ways is indeed more beneficial. However the output of tests will not always be the same across compilers, compiler versions, and operating systems.
What we're doing now with assert is a rudimentary version of that. The only things I would change about current stdlib testing is to allow showing a meaningful message (developer knows what failed), and failures don't stop the program (developer can see multiple failure by running the suite once).
Ok, so you're using assert
as a rudimentary testing framework, but it wasn't really meant for that, and so it's not quite serving that purpose very effectively. A testing framework is intended to make writing those "separate programs that call library functions" easier and more useful for debugging.
I wrote the current assert
in stdlib and in my codes I use it both as a testing framework and as Debug time testing (that unfortunately also gets executed in Release mode). In the short run it allows me (us) to get the work done and move on.
In the long run, I agree it's not a good solution for either use case.
For testing, we should use a testing framework that is integrated with fpm
, so that fpm test
will execute the tests, just like cargo test
does, and if a test fails, report which one failed, and what the LHS and RHS values are etc.
For Debug time tests, probably the only way currently in Fortran is to use a macro (until https://github.com/j3-fortran/fortran_proposals/issues/70 is accepted). However, I think this macro could be maintained in stdlib, as it would be very useful.
Per my previous comment, here is my proposal:
1) get rid of the assert
function, and introduce an ASSERT
macro. Use that for Debug time checking of array bounds and other conditions that our code assumes, that we do not want to be checking in Release mode for performance reasons. See #72 for that.
2) Either depend on a testing framework (see #162), or create our own "mini testing framework", and port our tests to use that
3) There is a third use case, where you want to be checking conditions in Release mode also. An example of that is this: https://github.com/certik/fortran-utils/blob/b43bd24cd421509a5bc6d3b9c3eeae8ce856ed88/src/linalg.f90#L110. I am currently undecided whether this use case is different from 1), i.e., whether this test should still be executed in Release mode. But if it should, then that's a third use case.
Thank you @certik. I like the long term plan and also support introducing ASSERT
macro.
In the meantime, do you object extending existing assert
subroutine to ease development so we can at least keep moving forward? What do we do for internal testing before we have ASSERT
macro and fpm test
?
I will take some time to tally up current support/objections and elicit further support for this.
@everythingfunctional Can you please open an issue to propose an external testing framework for stdlib testing?
sure thing
Evolved from discussion in #116.
Summary
This is a proposal to add a subroutine
test_condition
tostdlib_experimental_error
that will:assert
instdlib_experimental_error
does;assert
;This is useful for testing both programmer and user errors. It can be used regardless of how the program is built (debug or release mode).
It could also be used as a basic building block toward higher-level testing or exception handling proposed in #95.
Description
test_condition
tests an input logical condition. If the condition evaluates to.false.
, the default behavior is to stop the program and print the error message. Optionally, the user can pass an exit code to stop the program with. Optionally, the user can choose to not stop the program, but only print the warning.API
Arguments
condition
: Logical condition to testmsg
: (optional) Character string to print to stderr. Defaultcode
: (optional) Integer exit code to set when stopping the program. (what should be the default value? Current implementation of assert doesn't set any default code as far as I can tell.)warn
: (optional) Logical. If.false.
(default),test_condition
will stop the program ifcondition
is also false. If.true.
, it will print the error message to the screen and resume.Example
Implementation
This implementation depends on optval and error_stop.
Requesting feedback from @certik @nncarlson @zbeekman @jvdp1 @ivan-pi.
For anybody reading, please explicitly thumbs up or down, and write a comment if thumbs down. This way we can get a clear idea on whether this is supported or not and how much.