Fortran-FOSS-Programmers / Best_Practices

the opinionated *best practices* of Fortran FOSS Programmers group
Creative Commons Attribution Share Alike 4.0 International
64 stars 10 forks source link

goto, is to be or not (used)? #15

Closed szaghi closed 5 years ago

szaghi commented 7 years ago

Dear all, I am very sorry for my long silence...

OT

I am preparing four (yes 4) applications to public open calls for a stable research position (2 written proofs and 1 oral defense for each call that are on different field... a lot to study) that are very important (give me a good luck even if I am not competitive at all, but miracles could happen...), then there were the official work for the payed salary, the family... forgive me, but you are always in mind too

/OT

Currently, I have hinged into a nice debate on google group about Fortran into which the discussion soon diverged (maybe due to my comments, my apologizes) from the original OP request and it focused on the goto usage (in particular computed goto, but for brevity I use simply goto hereafter). Perhaps some of you have already read this google discussion (even I do not see your comments there), but I try to summarize some points here. Aside some harsh comments I obtained, the discussion was interesting for me. All originates from a sharp (for me) sentence of a very experienced and distinguished Fortraner, his name is Terence (pay careful to put only one 'r' :smile:), that stated:

Mind you, I still prefer a computed GOTO over CASE anyway, since this is usually compiled as a faster selection.

To me sentence is too much sharp because it entails a vague general significance that could let newbies who are learning Fortran (like me) to prefer goto because it is supposed to be faster than other Fortran constructs. In particular, the case debated was about branching-flow approaches: the OP has a (possible) long list of cases like the following:

select case(keyword)
case(key1)
  call worker1(keyword)
case(key2)
  call worker2(keyword)
...
case(keyN)
  call workerN(keyword)
endselect

The OP was wondering if there was a more concise/easy way to handle the branching in particular concerning the situation where new keywords were added (he struggled with the stuff to add new cases).

Aside the peculiar OP question, after the Terence's sentence my attention was captured by the supposed goto better performance.

I did not thrust Terence advice, thus I created some benchmarks here.

In my opinion the following considerations hold:

  1. goto is now considered obsolescent for good reasons, among which I like to remember:
    • goto branching is not readable: goto can jump everywhere, no ensure is provided on the sequential bias, using goto you must rely on the discipline of the developer;
    • goto needs labels, a lot of labels in the OP scenario, that hinges again with readability and maintainability;
    • goto-based branching-flow is not expressive; citing the words of great Ian Harvey: goto (1,2,3,4,5,6,7,8,9,10), keyword tells me that execution is going somewhere, based on the value of keyword, but tells me nothing about why execution is branching, or the nature of the branch;
    • goto is error-prone more than if elseif or select case;
    • goto is spaghetti-code-prone;
  2. goto supposed higher performance is a myth for the architectures (meaning hardware+compiler) at our disposal nowadays:
    • my tests (that are not to be considered quantitative or accurate, but only indicative) show that goto-based branching has very similar speed compared to equivalent if elseif or select case branching-flow approaches; as a matter of fact, all the 3 branching mechanisms are often compiled into the same low level machine-code, e.g. jump-table or constants-array;
    • goto prevents further optimization like the exploitation of multi-threading, thus we actually say good bye to the supposed better performance.

Considering also that if elseif and select case are more general than goto (that works with only integer) allowing for not only integers comparison.

In the google thread I was almost alone (FortranFan is probably the only other reader thinking like me): essentially the common culture considers goto still the fastest in the west. Indeed, some of the readers supporting this concept are focused on the old-good-days where goto was really faster because select case was just born (or even the only available options, e.g. pre F77/F90 days), but I do not consider much more relevant those applications: we are leaving in 2016, hopefully pre F77 and F77/F90 should be considered old, at least F95+ should be the mainstream standard, although it has more 20 years now... Obviously, I am referring to the development of new codes (as in OP request), I am not saying to convert all old codes written by means of goto that are still in use.

In my opinion goto is bad practice and should be discouraged. However, I am often wrong, thus your opinions are much more than welcome. I'll use your opinions for another little step of this guidelines.

My best regards.

Guideline Proposal (WIP, feel free to directly amend my proposal)

Structured is better than unStRUcturED, avoid as much as possible goto

In some circumstances, the algorithm-flow can be simplified by means of jump statements, e.g. return, exit, cycle and goto. While the former set generates structured flow where the algorithm is allowed to flow in only in one direction, the goto statement allows to jump elsewhere and generates often unstructured flows that are difficult to read and to maintain. As a matter of fact, goto could be very error-prone often leading to the so called spaghetti-code.

  • It is recommended to avoid, as much as possible, goto statement:
    • in almost all cases goto can be replaced by means of structured-safe constructs without any performance penalty while greatly improving readability and maintainability;
  • but exceptions should be allowed for very corner cases that, anyhow, should be in the hands of experts.

put here examples and references, to be completed

giacrossi commented 7 years ago

I think that, some times, performances are overrated: this case, in my opinion, is one of that. The better (not yet demonstrated) performances of go to statement are obtained with a drammatically high cost in terms of readability and future development possibilities of the code: as you have already noticed, all the cons that come with the use of this OLD construct are very very worse compared (for me) also in the case that with go to based branching the code goes two or three times faster than the same code with if else if or select case based branching.

For me, go to construct is EVIL :-)

LadaF commented 7 years ago

I would have to do some tests, but if the select case is implemented as a jump table (and normally it is) I wouldn't expect any difference in performance. Obviously that depends on the compiler and the optimization level.

I don't use goto almost at all but I can see how a single goto to a fixed label can help to avoid some complicated nested if statements if an exceptional behaviour or error occurs. The computed goto is just ugly to me.

szaghi commented 7 years ago

@LadaF

Dear Vladimir, thank you very much for your insight. Your point about exceptions (or more general concept like state machine algorithm) has been also highlighted in the google thread.

My understanding for this kind of application is very limited due to the very few cases where I faced with something similar to it.

I agree that in very corner cases the ability to jump elsewhere is very flexible, but the cons coming with goto are not balanced by the pros, i.e. the gain in flexibility. For me, similar state machine or exceptions handling can be achieved with other constructs, e.g. block+exit, or implemented with a more complex state pattern if the case is worth to adopt such a more complex and costly approach.

However, to improve my skills, can you give me some references where I can learn more about exceptions handling by means of goto? I you have not references, do no worry, I do not want you waste your time trying to rolling down an example for me.

Thank you again.

Cheers.

jeffhammond commented 7 years ago

In Fortran, GOTO can be an essential workaround for certain control flow patterns, particularly when an older dialect is used. In NWChem, GOTO is used extensively in the input file parser. I'm sure I could rewrite the code without GOTO, but it's possible that this would reduce efficiency (not that this matters in an input file parser), or worse yet, reduce readability.

Obviously, as with its use in C, it must be done carefully to avoid producing spaghetti code, but it turns out that this is common:

An empirical study of goto in C code from GitHub repositories (PDF)

At least in C, goto is used primarily for exception handling. See MPICH for an example of this pattern.

szaghi commented 7 years ago

@jeffhammond Dear Jeff, thank you for your comments. I'll look at NWChem to study your parser.

Just to summarize: you want goto to be not tagged as bad practice, right? This issue is also for creating an advice into this guide.

Thank you again.

Cheers.

jeffhammond commented 7 years ago

@szaghi

It's not my parser - it was written approximately 20 years ago, long before I was programming.

GOTO should be tagged as a bad practice in pedagogical material, because the vast majority of novices will use it wrong. It should not be prohibited by a software style guide, because it is useful if not necessary in very specific cases, which will be written by experts.

szaghi commented 7 years ago

@jeffhammond

It's not my parser - it was written approximately 20 years ago, long before I was programming.

Jeff, I know (some of my ex colleagues used it), my refer to your was just because I think you are currently maintaining/improving it, right?

GOTO should be tagged as a bad practice in pedagogical material, because the vast majority of novices will use it wrong. It should not be prohibited by a software style guide, because it is useful if not necessary in very specific cases, which will be written by experts.

I got it for our guide, as always I'll to create a wording that is a trade-of of our feelings.

However, I (currently) disagree: to me also experts should avoid it. Indeed, I am just searching for examples of special/corner cases where goto is necessary as you alluded to.

Cheers.

P.S. I given a flight view to NWChem input parser, I found only 2 goto, but they are enough to make me lost when tried to follow flow-pattern :smile:

zbeekman commented 7 years ago

At least in C, goto is used primarily for exception handling.

Wasn't a botched goto statement the cause of heartbleed or some other major recent exploit/bug? I only point this out to say that even the experts can mess it up. (However, IIRC the real issue was a missing semi-colon or something mundane like that... gotta love C...) Anyway, my point is that I agree that it is not best practice to use goto, but that there can be exceptions.

szaghi commented 7 years ago

@zbeekman

Wasn't a botched goto statement the cause of heartbleed or some other major recent exploit/bug?

Arghhhh, your notions are too much huge :clap:

Anyway, my point is that I agree that it is not best practice to use goto, but that there can be exceptions.

Your opinion has registered, thank!

However, I am just searching for such exceptions, I would like very much to add some good examples.

Cheers.

jacobwilliams commented 7 years ago

Go To Statement Considered Harmful

jeffhammond commented 7 years ago

I sent this yesterday morning but Github was down and apparently their server discarded my email response rather than doing the right thing.

It's not my parser - it was written approximately 20 years ago, long before I was programming.

Jeff, I know (some of my ex colleagues used it), my refer to your was just because I think you are currently maintaining/improving it, right?

I still work on NWChem but have no intention to rewriting the code to use fewer GOTO statements :smile:

GOTO should be tagged as a bad practice in pedagogical material, because the vast majority of novices will use it wrong. It should not be prohibited by a software style guide, because it is useful if not necessary in very specific cases, which will be written by experts.

I got it for our guide, as always I'll to create a wording that is a trade-of of our feelings. However, I (currently) disagree: to me also experts should avoid it. Indeed, I am just searching for examples of special/corner cases where goto is necessary as you alluded to.

Almost nothing in programming languages is truly necessary. See BF for a set of necessary programming constructs. :smile:

By definition, I trust experts to use features of a programming language as much or as little as necessary. As I said before, prohibiting GOTO is inappropriate. Discouraging confusing control flow in general is a great recommendation. Using GOTO to jump in and out of 5000-line if-elif-else-endif construct - as NWChem used to do in src/tce/tce_energy.F - is evil, but GOTO is but a small part of that.

A recommendation to keep GOTO <n> within 50 lines of <n> is great, so long as no one ever adds code in between the two :smile:

Another possibility is to document GOTO with a "see also" along the lines of what GCC does on e.g. https://gcc.gnu.org/onlinedocs/libgomp/GOMP_005fSTACKSIZE.html#GOMP_005fSTACKSIZE, which helps programmers find portable substitutes for non-portable constructs. Similarly, Fortran programmers who want to learn about GOTO should be pointed to switch-case.

P.S. I given a flight view to NWChem input parser, I found only 2 goto, but they are enough to make me lost when tried to follow flow-pattern

There are 9100 instances in the current source tree :wink:

The following seems clear enough to me, but of course is not necessary. One could use e.g. switch-case instead.

!
! ----------
! Read input
! ----------
!
   10 if (.not. inp_read()) 
     1  call errquit('tce_input: failed reading input',0,
     2  RTDB_ERR)
      if (.not. inp_a(test)) 
     1  call errquit('tce_input: failed reading keyword',0,
     2  RTDB_ERR)

SNIP

!
!     END
!
      else if (inp_compare(.false.,test,'end')) then
        goto 20
      else
        call errquit('tce_input: unknown directive',0,INPUT_ERR)
      endif
      goto 10
!
! ------
! Return
! ------
!
   20 return
      end
jeffhammond commented 7 years ago

@jacobwilliams Every person on this thread is surely familiar with that paper. Please engage the intellectual of the substance of the issue rather than just post a link as if this is any substitute for a logical argument. It's borderline trolling to do such a thing in this context.

certik commented 7 years ago

I agree with @jeffhammond. I would mention that even BF has way too many language constructs, the simplest possible language is probably a binary lambda calculus and here is a BF interpreter written in it. So the point is, as Jeff said, that probably any Fortran feature can be useful in some circumstance.

However, for new Fortran programmers, I think it's good to be opinionated and encourage them in the right direction (in this case no goto statements), which works in the vast majority cases, and then a person can always decide to break the recommendation if he has good reasons. That's the spirit I've used behind http://www.fortran90.org/ --- I put there recommendations that seemed to me were good practice after talking to more experienced Fortran programmers, but it is meant in a sense "if you don't know what to do, then follow it", but if you know what you are doing and are aware of the recommendation, then you can of course break it.

certik commented 7 years ago

@jacobwilliams, @jeffhammond regarding Dijkstra, he wrote another famous paper regarding if one should start counting from 0 or 1, and I (as well as many other Fortran programmers that I talked to, though I also know several other people who do not agree with me) consider his conclusion incorrect, especially with regards to Fortran, as explained here in detail: http://www.fortran90.org/src/faq.html#what-is-the-most-natural-way-to-handle-initial-and-final-points-of-intervals, so my conclusion is that Dijkstra is a famous person and an authority, but one should not just take what he (or any other authority!) wrote for granted, but rather critically think about it, and you might reach a different conclusion than Dijkstra did.

jeffhammond commented 7 years ago

@certik An often ignored or unappreciated feature of Fortran is that it does not prevent the programmer from using 0 as the base, or even -1000.

  real fortran(11)
  real c_and_cxx(0:10)
  real custom(-5:5)

If somebody wants Fortran to behave like C/C++ w.r.t. array indexing, they can certainly do it, it just takes a bit more work.

muellermichel commented 7 years ago

How about resource cleanup phases? To illustrate, some python like pseudo code:

a = get_ressource_a()
if not a:
   handle_errors()
   return
b = get_ressource_b(a)
if not b:
   handle_errors()
   goto cleanup_a
process(a,b)
cleanup_b:
   #do the cleanup
cleanup_a:
   #do the cleanup
szaghi commented 7 years ago

@jeffhammond Dear Jeff, thank you very much this insight, indeed a lot helpful.

still work on NWChem but have no intention to rewriting the code to use fewer GOTO statements :smile:

Wiseman! totally agree, if it works well there is no need to touch, my goto-question is only to driven new coders like me to a good practice, no need to modify old codes working well.

Almost nothing in programming languages is truly necessary. See BF for a set of necessary programming constructs. :smile:

Wonderful, I do not know BF, very educative!

By definition, I trust experts to use features of a programming language as much or as little as necessary. As I said before, prohibiting GOTO is inappropriate. Discouraging confusing control flow in general is a great recommendation. Using GOTO to jump in and out of 5000-line if-elif-else-endif construct - as NWChem used to do in src/tce/tce_energy.F - is evil, but GOTO is but a small part of that.

Sure, my intention is not to prohibit (anything) rather to give good advice. To be more clear: all this goto-mubles arise into me when into a thread asking for conciseness-clearness and easy to maintain/develop a very experienced coder suggested to exploit goto (further because it is supposed to be faster...the speed was not concerned!) and this for a new code developed by a newbie... I am aware that for old dialect it is almost mandatory, but with modern Fortran (sigh, 2003 has now 13 years... without considering 2008 that has not still full support), goto is anachronistic (at least). For example, I give to @rouson full trust, but he suggest me to use goto, I'll evaluate his words character by character because surely he can maintain/develop a code polluted by goto, but humans like me likely not and hopefully we write codes for also humans. In Italy we wise sentence sounding like see and learn what experts do, but do not what experts do.

A recommendation to keep GOTO within 50 lines of is great, so long as no one ever adds code in between the two
Another possibility is to document GOTO with a "see also" along the lines of what GCC does on e.g. https://gcc.gnu.org/onlinedocs/libgomp/GOMP_005fSTACKSIZE.html#GOMP_005fSTACKSIZE, which helps programmers find portable substitutes for non-portable constructs. Similarly, Fortran programmers who want to learn about GOTO should be pointed to switch-case.

Nice! I'll try to add these to this guideline.

There are 9100 instances in the current source tree :wink:

Sure, I was referring to only the first file I inspect, I think something file_parser. Even if found only 2 gotos I got lost soon :smile:

The following seems clear enough to me, but of course is not necessary. One could use e.g. switch-case instead...

For my simple mind, the presence of label-gotos make me really afraid... I am quite dyslexic and following a jump-everywhere flow is difficult, even I understand it, I am always afraid to have missed something.

Thank you very much, all are saved for a good guideline (I hope).

Cheers.

szaghi commented 7 years ago

@certik

Dear Ondrej thank you for comments.

However, for new Fortran programmers, I think it's good to be opinionated and encourage them in the right direction (in this case no goto statements), which works in the vast majority cases, and then a person can always decide to break the recommendation if he has good reasons. That's the spirit I've used behind http://www.fortran90.org/ --- I put there recommendations that seemed to me were good practice after talking to more experienced Fortran programmers, but it is meant in a sense "if you don't know what to do, then follow it", but if you know what you are doing and are aware of the recommendation, then you can of course break it.

I agree, this should be clear for all sort of guidelines. However, the story about goto is quite strange and unique. When I observed that goto has been declared obsolescent for good reasons the first replay I obtained (from experts!) was mind you goto is fastest... aside the clear false general significance (it could be faster in some circumstance, not always), when I replied asking for details and benchmarks the reasons quickly changed to it is necessary... when I prove that for the few examples they provided that other better approaches are viable the reasons switched to some not specific circumstances goto cannot be avoided, but my request for examples were not covered.

I really think that goto is overcome by other constructs, but I know my experience is very limited, thus I am trying to challenge my thesis with the help of your experience. I think that a sane approach is to doubt firstly about self-idea and my idea is that goto is the evil. My hope is that you prove me I am wrong, possibly with nice examples :smile:

Thank for your help.

Cheers.

P.S.

Dijkstra is a famous person and an authority, but one should not just take what he (or any other authority!) wrote for granted, but rather critically think about it, and you might reach a different conclusion than Dijkstra did.

Totally agree, I start to think to goto just from these point of view... some experts said it is so wonderful,,, let's see how really it is :smile:

szaghi commented 7 years ago

@jacobwilliams Thank you very much for recalling to me that reference!

szaghi commented 7 years ago

@muellermichel Dear Michael, I am not sure to follow you, can you elaborate a bit more?

Cheers.

EDIT: I have forgotten the most important word last night... not...

jacobwilliams commented 7 years ago

@szaghi You're welcome! I admit, I agree with Dijkstra on this. In my > 1 decade of programming Fortran professionally, I don't believe I've ever written a GOTO statement. Nowadays, now that we have block with exit I think there is even less reason to use it. Obviously, legacy code is a different story. As you know, I'm a big fan of modernizing legacy code, but I realize that is not always possible in some cases.

muellermichel commented 7 years ago

@szaghi: In systems programming you often deal with multiple resources that need to be cleaned up in reverse order of their initialisation.

I can think of at least one usecase in Fortran: Say you create your own virtual stack on the heap (i.e. a memory pool), and to avoid expensive allocation and freeing you simply postulate that the virtual dealloc needs to be in reverse order of the alloc, so it becomes a simple decrement of an integer(8). Can you follow so far? Or should I go into what makes fully dynamic allocate/deallocate expensive compared to such a restricted version?

So, assuming that we require an algorithm like that, i.e. doing certain operations in reverse order of others, how would you do it without goto? IMO there's only the following answers: try/finally (nesting grows with number of resources), a longer version of the code with branches and code duplication (nesting grows with number of resources) or subroutine calls (callstack grows with number of resources). I therefore postulate that the only pattern capable of handling this usecase without code complexity or even performance degradation growing linearly with number of resources, is the goto I've outlined above.

@jacobwilliams: Blindly applying the 'wisdom of elders' like Dijkstra is why Computer 'Science' is still not an actual science, as in there's way too much dogma. I challenge anyone to find a cleaner way of dealing with my counterexample (or that of the Linux kernel, see link below) to Dijkstra's postulate. And that's what it is, a postulate, not a proven law.

See also: http://koblents.com/Ches/Links/Month-Mar-2013/20-Using-Goto-in-Linux-Kernel-Code/

szaghi commented 7 years ago

@muellermichel Dear Michael, your test was, indeed, what I am searching for, thank you very much.

I am on the same line of @jacobwilliams , but we (I am sure for me and almost for Jacob) do not accpet blindly anything: Jacob has much more experience than me thus his understanding have more significance than mine, but I like to report a very similar experience; in my poor codes I never find the necessity to use goto and aside the poor quality of my codes, I am often very cumbersome and it is likely that I put myself into a not-necessary-corner-without-exit, thus corner cases are familiar for me :smile: Anyhow, your test seems to be one that I never faced :tada: :tada: :tada: , but as you guess I do not follow you completely :cry:

I had used quite a lot try/except construct in Python, but the description of your test is somehow obscure on to be solved with it. I followed you in the description of the virtual stack and the need for a virtual dealloac much more cheap than the real alloc/dealloc. Anyhow, we going to a dangerous corner scenario, because this seems to allude to also difference to static/dynamic memory... if want efficient alloc/dealloc with almost O(1) access the first thing coming into my poor mind is something like ash table rather a virtual stack, but I think my poor minded view of world. Indeed, I like your virtual stack.

So we have a virtual stack. I come until here... but the virtual dealloc by goto is somehow obscure for me (mainly because I do know how to use goto). Can ask you for a little bit help? It would be very helpful for me to have a goto example, thus I can try to see if I can avoid it or not. If I consider your python like example, I could think to something like

call get_ressource_a(a)
if (.not.a) then
   call handle_errors
   return
endif
call get_ressource_b(a, b)

check_b: block
  if (.not.b) then
     call handle_errors
     exit check_b
  endif
  call process(a,b)
  cleanup_b: block
     ! do the cleanup
  end block cleanup_b
end block check_b

cleanup_a: block
   ! do the cleanup
end block cleanup_a

I am not sure if I can nest blocks, but the cleanup_b block is not strictly necessary.

I am also not sure if my Fortran code follow your pseudo-python bias, but if you can give me a Fortran goto-based example I can try to see if there is an alternative non-goto-based cleanup.

Cheers.

muellermichel commented 7 years ago

Yes, your code is pretty much exactly what I mean. Notice how the depth of your branch now grows with the number of resources you need to handle? Depth zero handles one, depth one handles two, .. depth N handles N + 1. So, could you still write it in a readable and efficient manner for four, five or six resources? With goto no problem:

 a = get_ressource_a()
 if not a:
    handle_errors()
    return

 b = get_ressource_b(a)
 if not b:
    handle_errors()
    goto cleanup_a

 c = get_ressource_c(a,b)
 if not c:
    handle_errors()
    goto cleanup_b

 d = get_ressource_d(a,b,c)
 if not d:
    handle_errors()
    goto cleanup_c

 process(a,b,c,d)

 cleanup_d:
    #do the cleanup
 cleanup_c:
    #do the cleanup
 cleanup_b:
    #do the cleanup
 cleanup_a:
    #do the cleanup
szaghi commented 7 years ago

@muellermichel

Michel, you are absolutely right, but this second case I can think to something like (assuming I can do some minor changes):

 call get_ressource_a(a)
 if (.not.a) then
    call handle_errors()
    return
endif

call get_ressource_b(a, b)
if (.not.b) call handle_errors()
call get_ressource_c(a, b, c)
if (.not. c) call  handle_errors()
call get_ressource_d(a, b, c, d)
if (.not.d) call handle_errors()
if (a.and.b.and.c.and.d) call process(a,b,c,d)
if (d) call cleanup_d
if (c) call cleanup_c
if (b) call cleanup_b
if (a) call cleanup_a
contains
  subroutine cleanup_a...
  subroutine cleanup_b...
  subroutine cleanup_c...
  subroutine cleanup_d...

Again, I am not sure to have followed your flow, goto is obscure for me :smile:

Cheers.

EDIT: I again lost the not, my dyslexic-addiction is going worse... sorry!

muellermichel commented 7 years ago

No, it doesn't work anymore now. You still call get_ressource_c(a, b) even though b may be uninitialised. You'd have to inline branch in front of every line if you don't want nested branches - but that only makes it worse. I strongly advocate against inline branches - like in C with braces, a Fortran branch should always be followed by then ... end if [1]. That to me is more important than avoiding goto. I don't really get what's obscure about a forward-only goto with a well named label to be honest. It's like a subroutine call that automatically captures all the scope you currently have.

edit: I can go into details why I think [1] is important if that is desired, but it might be better done in a separate thread (if such does not already exist).

szaghi commented 7 years ago

Dear Michel,

sorry for my mistakes, but this proves that also simple gotos are misleading... Anyhow your python-pseudo-code is somehow vague, thus my answers could be only vague. I see now that probably your thinking to a, b, ...d as the actual resources, meaning that you have a = None, b = None..., d = None in your Python function, while I was thinking to them just as logical sentinels, thus in my over-under-looked their usage there were no problems to (say) get_resource_c(a, b, c) because inside it I'll check if a and b were true. If you prefer a more explicit checks the if inline is really more clear than goto because the execution is linear and not chaotic

call get_ressource_a(a)
if (.not.a) then
  call handle_errors()
  return
endif

call get_ressource_b(a, b)
if (.not.b) call handle_errors()
if (b) call get_ressource_c(a, b, c)
if (.not. c) call  handle_errors()
if (b.and.c) call get_ressource_d(a, b, c, d)
if (.not.d) call handle_errors()
if (a.and.b.and.c.and.d) call process(a,b,c,d)
if (d) call cleanup_d
if (c) call cleanup_c
if (b) call cleanup_b
if (a) call cleanup_a
contains
  subroutine cleanup_a...
  subroutine cleanup_b...
  subroutine cleanup_c...
  subroutine cleanup_d...

I don't really get what's obscure about a forward-only goto with a well named label to be honest. It's like a subroutine call that automatically captures all the scope you currently have.

Well, this could be really a personal taste (I suppose), but by definition goto jump everywhere. A well named label often becomes a cumbersome-flow and the jump-everywhere is a real matter. Even in your little pseudo-code is not immediately clear what we are doing, while an explicit if is more clear: if this condition is true do exactly this thing is different from jump elsewhere (even if well labeled, and in Fortran how I can have such good label?), that could be also before where we are now.

like in C with braces, a Fortran branch should always be followed by then ... end if [1]. That to me is more important than avoiding goto.

Please, absolutely yes! Feel free to open a new thread. I was very addicted to the if without then end if, I really like to learn why it should be avoided.

Cheers.

muellermichel commented 7 years ago

I was coming to this from a C programmer perspective and must admit I didn't look too much into goto/labels in Fortran. So far I never actually needed it, but I'm about to implement an automatic memory pool (using my source-to-source translator), so this kind of resource management was somewhere in the front of my mind. I agree that purely numeric labels are extra nasty - I have the suspicion that this failure of modernisation is another victim of Dijkstra's essay. Under those circumstances I'd probably employ the uppercase F90-style preprocessor and replace the numbers with labels in order to make the intent more clear.

Apart from that I have to say I don't really follow what you wrote. At the end we are always talking about jumps. Branches, loops, cycle- and exit statements as well as gotos are all jumps. They are all clearly defined through their syntax, none of them is per se more chaotic than the other. They can all be abused to produce spaghetti or any other kind of pasta code. Throughout my professional life as a software engineer I've learned that nested branches lead to some of the most unmaintainable codes. For that matter I'm basically employing the following overarching rules when writing purely imperative code:

Now, for 99.9% of imperative code, this is already enough. This leads to a code style that tries to be as flat as possible, so that reflecting about its operations becomes as easy as possible. When a state leads to a return/break/continue, I can basically delete that state from my mental stack and explore the next state. Does that make sense? If I have to go find the else-condition somewhere 100 lines further (and hope I hit the correct intendation level) and then jump back when analysing, it's way more difficult to review a code.

Now, going from this early-return preference to forward-only cleanup gotos, it's basically only taking this logic one step further. As long as I use a goto if and only if it is used in place of a return, I can separate the concerns:

1.) Is the cleanup logic correct (right order) and used in the correct way?

2.) investigate the rest of the code logic and treat gotos like a return in my mental model.

szaghi commented 7 years ago

Dear Michel,

thank you very much!

I was coming to this from a C programmer perspective and must admit I didn't look too much into goto/labels in Fortran. So far I never actually needed it, but I'm about to implement an automatic memory pool (using my source-to-source translator), so this kind of resource management was somewhere in the front of my mind. I agree that purely numeric labels are extra nasty - I have the suspicion that this failure of modernisation is another victim of Dijkstra's essay. Under those circumstances I'd probably employ the uppercase F90-style preprocessor and replace the numbers with labels in order to make the intent more clear.

Wonderful, your perspective is really relevant. I am a very poor Fortraner, with a very very poor understanding of C. I think I can learn a lot from you experience.

Apart from that I have to say I don't really follow what you wrote. At the end we are always talking about jumps. Branches, loops, cycle- and exit statements as well as gotos are all jumps. They are all clearly defined through their syntax, none of them is per se more chaotic than the other. They can all be abused to produce spaghetti or any other kind of pasta code.

I disagree. With goto I can jump everywhere, likely I can jump up then down, then up-up, in the middle and then down-down... with other constructs I have to be strictly structured. All constructs can be abused, in this field you can consider me a master... I am able to make complex and obscure even the most simple branch... my bad! However, the possibility that normal or good Fortraners can make a select case branch more clear than an equivalent goto is high. In other words, goto is more spaghetti-code-prone than others, thus if it does not really offer me more, why I should use it?

To me, 2 main reasons are generally used to promote goto

  1. goto is the fastest in the west;
  2. goto is necessary for some corner cases.

I think the first sentence is generally easy to be demystified, while the second is really more interesting (and hopefully true). You are in the right way to provide me a good example of such a corner case, but the previous pseudo-Python code is not enough to convince me.

Throughout my professional life as a software engineer I've learned that nested branches lead to some of the most unmaintainable codes.

I agree. When I studied @rouson book, I find that many of such cases, mainly due to my abuse of nested branches, can be avoided with a little of OOD (if I can diverge little from pure imperative).

For that matter I'm basically employing the following overarching rules when writing purely imperative code:

  • is the code 'hot'? If not, what's the right balance in splitting off subroutines vs. having a good overview over what's going on? E.g. if any kind of expensive resources are used in a process, I like to see any non-reusable init/cleanup code in one wrapper subroutine and split off the actual computations (process).

I agree, but I firstly adhere to the KISS principle, so the I like to wrap/split the init/cleanup code into procedure(s) contained into the processor unit, there should be not overhead in using call cleanup with respect ! start inline cleanup.... If I understand right this point (probably not), the scenario should be:

subroutine process()
! init expensive resources
call get_resouce(a)
if (a) then
allocate(..)
else
  return
endif
! do the processor's work
do i=1, N
  ! heavy work
  if (error) then
   call handle_errors()
    goto 10
  endif
enddo
! clean up
10 deallocate(...)
endsubroutine process

In this case, I see no particular problem to split the logic

subroutine process()
call initialize
if (is_initialized) 
  call work
  call clean
endif
contains
  subroutine initialize
  ...
  end subroutine initialize
  subroutine work
  ...
  end subroutine work
  subroutine clean
  ...
  end subroutine clean
endsubroutine process
  • Is there any code repeating itself? -> map to a loop.

I agree, and I add map to a loop calling procedures wrapping the logic :smile:

  • Are there branches based on the state of the inputs and do we omit the inner computations based on this state? -> Never nest those branches, instead return early.

I agree in general, but it is difficult to make it concrete. In general I address this with return (into wrapping procedure) or exit (into inline blocks), not with goto.

+Within the loops, do the same logic with branches, use cycle and exit instead of return (or use the branch continuation condition). Almost never have more than one level of branches within the loop (only when it serves code deduplication, i.e. DRY is more important than avoiding nesting).

I agree.

Now, for 99.9% of imperative code, this is already enough. This leads to a code style that tries to be as flat as possible, so that reflecting about its operations becomes as easy as possible. When a state leads to a return/break/continue, I can basically delete that state from my mental stack and explore the next state. Does that make sense?

Absolutely yes, I like it, I like the flat as possible logic, but to me goto makes this more difficult, whereas other constructs let poor-coders like me to obtain a more sane code.

If I have to go find the else-condition somewhere 100 lines further (and hope I hit the correct intendation level) and then jump back when analysing, it's way more difficult to review a code.

I agree, but you should not compare the branches in a so general meaning, you should design a concrete example where goto is employed and let me to try to find an alternative to goto. When we have the two examples side by side we can try to analyze pros and cons of both. In my limited experience I never coded 100-lengthy if-elseif branch, or simply branches longer as 100 SLOCs, but I doubt that in this case goto can make my code more clear... I got lost with just 2 gotos :smile:

Cheers.

muellermichel commented 7 years ago

How about this code as a basis for discussion? It's still theoretical code, but closer to how it would look like for my usecase.

#define CLEAN1 10
#define CLEAN21 20
#define CLEAN321 30
#define CLEAN4321 40

! is called thousands of times every timestep
! since we use CUDA Fortran for the kernels, allocating and freeing the temporary memory
! would impact the performance way too much (device pointers are expensive to allocate/deallocate)
! in case the pool is out of memory, we may want to handle this in the caller of this subroutine
! and allocate more memory to the pool there and move over everything up to the pool stack index. 
! therefore the pool needs to be cleaned up, no matter what happens.
subroutine apply_advection(result, input1, input2)
    real, intent(out), device :: result(nx,ny,nz)
    real, intent(in), device :: input1(nx,ny,nz), input2(nx,ny,nz)
    real, dimension(:), pointer, device :: temp1, temp2, temp3, temp4
    integer(4) :: size_of_temp_pointers

    size_of_temp_pointers = nx * ny * nz

    call assign_memory(temp1, size_of_temp_pointers)
    if (.not. associated(temp1)) then
        call handle_errors("out of memory for temp1")
        return
    end if

    call assign_memory(temp2, size_of_temp_pointers)
    if (.not. associated(temp2)) then
        call handle_errors("out of memory for temp2")
        goto CLEAN1
    end if

    !first kernel. outputs to temp1 and temp2
    call apply_advection_part1(temp1, temp2, input1, input2)

    call assign_memory(temp3, size_of_temp_pointers)
    if (.not. associated(temp3)) then
        call handle_errors("out of memory for temp3")
        goto CLEAN21
    end if

    call assign_memory(temp4, size_of_temp_pointers)
    if (.not. associated(temp4)) then
        call handle_errors("out of memory for temp4")
        goto CLEAN321
    end if

    !second kernel. outputs to temp3 and temp4
    call apply_advection_part2(temp3, temp4, temp1, temp2)

    !third kernel. outputs to result
    call apply_advection_part3(result, temp3, temp4)

    CLEAN4321 call unassign_memory(temp4)
    CLEAN321  call unassign_memory(temp3)
    CLEAN21   call unassign_memory(temp2)
    CLEAN1    call unassign_memory(temp1)
end subroutine
szaghi commented 7 years ago

@muellermichel Thanks! It looks interesting... let me some times to study it.

Cheers.

muellermichel commented 7 years ago

(please refresh the browser, just did a few fixes ;-) )

szaghi commented 7 years ago

@muellermichel

Dear Michel,

this is my first tentative :smile:

Preliminaries

subroutine apply_advection(input1, input2, result)
real, device, intent(in),           :: input1(:,:,:)
real, device, intent(in),           :: input2(:,:,:)
real, device, intent(out),          :: result(:,:,:)
real, device, dimension(:), pointer :: temp1
real, device, dimension(:), pointer :: temp2
real, device, dimension(:), pointer :: temp3
real, device, dimension(:), pointer :: temp4
integer                             :: size_of_temp_pointers

size_of_temp_pointers = size(input1, dim=1) * size(input1, dim=2) * size(input1, dim=3)

temp1 => null()
temp2 => null()
temp3 => null()
temp4 => null()

call assign_memory(temp1, size_of_temp_pointers)
if (.not.associated(temp1)) then
  call handle_errors("out of memory for temp1")
  ! nothing to clean
  return
endif

call assign_memory(temp2, size_of_temp_pointers)
if (.not.associated(temp2)) then
  call handle_errors("out of memory for temp2")
  call clean_memory
  return
endif

call apply_advection_part1(temp1, temp2, input1, input2)

call assign_memory(temp3, size_of_temp_pointers)
if (.not.associated(temp3)) then
  call handle_errors("out of memory for temp3")
  call clean_memory
  return
endif

call assign_memory(temp4, size_of_temp_pointers)
if (.not.associated(temp4)) then
  call handle_errors("out of memory for temp4")
  call clean_memory
  return
endif

call apply_advection_part2(temp3, temp4, temp1, temp2)

call apply_advection_part3(result, temp3, temp4)

call clean_memory
contains
  subroutine clean_memory()
    ! A safe cleaner, with sequential-structured bias, no need to reverse order.
    if (associated(temp1)) call unassign_memory(temp1)
    if (associated(temp2)) call unassign_memory(temp2)
    if (associated(temp3)) call unassign_memory(temp3)
    if (associated(temp4)) call unassign_memory(temp4)
  endsubroutine clean_memory
endsubroutine apply_advection 

Comments

Disclaimer-1 I have no experience with CUDA Fortran (just a small with pure CUDA), no idea of what device attribute implies (other than this variable is defined on the accelerated device (GPU));

Disclaimer-2 I know that in my version there are some more if evaluations (in some cases) that are not present into your jump-direct version, but I hope that we are now concerned about flow-clearness/conciseness/maintainability. On the contrary, if you are focusing on the speed of execution of the goto-based flow with respect my 4 if, well we should stop to think in abstract and go with a real benchmark: the premature-optimization is the way for all evil. In this latter case we should quantify (at least with a raw estimation) how big is the goto-speedup and compare this pros with the cost in terms of readability/maintainability (that is more difficult to quantify being for big part of personal perspective).

With the hope I have correctly replicated your bias, I think that my version is safer/more-clear:

What do you think?

muellermichel commented 7 years ago

Hey Stefano, thank you very much for your big effort! I'm learning a lot here I have to say. Here are my thoughts:

  1. You are right about the order actually being unnecessary. I was thinking about the general case where the temps have different memory sizes, but it doesn't actually matter - as long as you go in and out of scope at the same time, only the total length of the used memory actually matters. The cleanup could even be reduced to one integer decrement if one were to make the pool aware of what routine is currently using the pool (it could then keep track of the total memory size in a table). Thank you, this back and forth was actually really helpful for me!
  2. Since I can't really think of a usecase for reverse cleanup in Fortran anymore, I support the notion that goto in Fortran is unnecessary. It appears to me that these finely controlled cleanup phases are only necessary in systems programming (such as OS kernels) where you often have resources depend upon each other. (Say, a database that depends on an open file that depends on a network mount). I hope that people here agree with me that Fortran should never be used for such software, as there are much more suitable options around. (As a sidenote, I always advocate for a two-language solution: Fortran for kernels and 'hot code', Python for the user interface, I/O, etc.
  3. The main disadvantage we have in your code is the subroutine call to clean_memory. Since GPU kernels are somewhat expensive to run anyway, I don't think it matters, and if it does there is still the inliner (although I'm not a fan of having to rely on that thing, it's often quite a hassle to deal with). I don't think further performance considerations are necessary - the main thing would be to make sure there aren't any cudaMalloc / cudaFree, and that can easily be found using nvidia's profiler.
  4. First style question: Do the runtime bound checks still work (enabled with compiler flags) when using dynamic size specifications for the inputs/outputs? If not I'd actually advocate against it. Bound checks have been invaluable to me to find bugs early. Even though this doesn't work for CUDA Fortran, I still use it in the user code because it also emits to CPU-only code where checks are supported. However I guess that it should still work since Fortran keeps array sizes stored in tables, so in that case I'm all for it.
  5. Second style question: Why always one line per data object specification? IMO a list for identical specifications is more DRY and less boilerplate-y.
jacobwilliams commented 7 years ago

Note: for 3, you could eliminate the subroutine call by using a BLOCK construct. In short:

main: block
    !main code
    !...
    exit main ! if there are errors
    !...
end block main

!cleanup code here

Although I don't know if one is any more efficient than the other.

muellermichel commented 7 years ago

@jacobwilliams That should be more efficient, yes. As I wrote though, I don't think that it matters, it could for other cases however where you use this kind of temp initialization directly within kernels, say, one temp Z-column for every X-Y-datapoint, which is a very common way of programming atmospheric simulations on CPU for example - although in that case it would have to be measured whether a memory pool is actually faster than a simple Stack array, which I doubt.

In general I do like your style a bit more though, I find it more readable because now the temp inits and destroys are at the same outer intendation while the interesting code is intended within the block. Just one note though: I think the return is wrong, isn't it? The cleanup code shouldn't get omitted.

jacobwilliams commented 7 years ago

@muellermichel Yes, I corrected that as soon as I submitted it. Refresh!

szaghi commented 7 years ago

@muellermichel @jacobwilliams

thank you very much for your big effort! I'm learning a lot here I have to say.

On the contrary, thank you so much! I am here only to try to steal your experience :smile:

  1. You are right about the order actually being unnecessary. I was thinking about the general case where the temps have different memory sizes, but it doesn't actually matter - as long as you go in and out of scope at the same time, only the total length of the used memory actually matters. The cleanup could even be reduced to one integer decrement if one were to make the pool aware of what routine is currently using the pool (it could then keep track of the total memory size in a table). Thank you, this back and forth was actually really helpful for me!

Indeed, mine would not to be a general consideration, but only a fortunate case suitable for your corner-example.

  1. Since I can't really think of a usecase for reverse cleanup in Fortran anymore, I support the notion that goto in Fortran is unnecessary.

Arghhhh... I am still not completely convinced (of my own idea), I would not convince you about mine rather you prove me I was wrong :smile: For example, I think that Jacob could have really nice examples. I love his care on legacy-codes-modernization and I know he faced with some very difficult cases: @jacobwilliams is it possible that some of those challenging cases were related with goto?

  1. (continued) It appears to me that these finely controlled cleanup phases are only necessary in systems programming (such as OS kernels) where you often have resources depend upon each other. (Say, a database that depends on an open file that depends on a network mount). I hope that people here agree with me that Fortran should never be used for such software, as there are much more suitable options around. (As a sidenote, I always advocate for a two-language solution: Fortran for kernels and 'hot code', Python for the user interface, I/O, etc.

I agree in general, for each application there is more suitable language. However, I love challenges and if there are brave coders who try to develop OS in Fortran they have all my admiration :smile:

  1. The main disadvantage we have in your code is the subroutine call to clean_memory. Since GPU kernels are somewhat expensive to run anyway, I don't think it matters, and if it does there is still the inliner (although I'm not a fan of having to rely on that thing, it's often quite a hassle to deal with). I don't think further performance considerations are necessary - the main thing would be to make sure there aren't any cudaMalloc / cudaFree, and that can easily be found using nvidia's profiler.

Yes, when I write it I immediately realized also the Jacob's block-based solution (and others less elegant), but I think that as you noted, the optimizer likely inline it. I know that you, as great Software Engineer, are mainly focused on extract each FLOP from your car, but you should consider that others like me are more focused on the math/physics: in my codes I have to change rapidly-easily-clearly the mathematical-numerical models, thus I prefer even a not-so-fast solution until it offers more clearness.conciseness.maintainability. I am not a good programmer, but I hope to be a decent numerician.

  1. First style question: Do the runtime bound checks still work (enabled with compiler flags) when using dynamic size specifications for the inputs/outputs? If not I'd actually advocate against it. Bound checks have been invaluable to me to find bugs early. Even though this doesn't work for CUDA Fortran, I still use it in the user code because it also emits to CPU-only code where checks are supported. However I guess that it should still work since Fortran keeps array sizes stored in tables, so in that case I'm all for it.

Really a good question. In general, my current best practice is to use extensively assumed shape over explicit dimensions. I use to test my code with bullet-proof-paranoiac-debug-mode before the release mode, entailing also the bound checks. I do not know if the compilers I am using (Intel/GNU) do some tricks, but they seems to well capture my out-of-bounds mistakes. I think that others here can give you the right answer, mine can be discarded safely.

I was not aware they does not work with CUDA Fortran. OT: CUDA Fortran is the implementation of PGI?

  1. Second style question: Why always one line per data object specification? IMO a list for identical specifications is more DRY and less boilerplate-y.

As I said, my brain has some (big) limitations... this help me to have a more clear view of the world. Moreover, I (ab)use of docstring, I like to put even not useful comments and sometimes the comment exceed the 132th character thus I split into the next lines: having one line data help me in this. Finally, with vim I can do very very rapidly my cosmetic changes almost always, but having the one-line style helps to speedup this further.

Thank you very much, having the help of a Software Engineer is very very useful.

My best regards.

muellermichel commented 7 years ago

Arghhhh... I am still not completely convinced (of my own idea), I would not convince you about mine rather you prove me I was wrong 😄 For example, I think that Jacob could have really nice examples. I love his care on legacy-codes-modernization and I know he faced with some very difficult cases: @jacobwilliams is it possible that some of those challenging cases were related with goto?

Hehe that one got a chuckle out of me. Well, thinking about it again, there's two candidates that would fit the systems programming corner I was talking about: File I/O and network I/O.

Let's stay with File I/O since it's easier to understand and let's say you have File A and File B. For curious reasons the format of B's records are dependant on A records. Maybe A is a grid file while B is a data file. To save memory, you only read one record at a time from A and then read the corresponding one from B, then apply some number crunching and output to C, so you have three files open. This is a case where A almost always needs to be closed, while B could error out, and you don't want File descriptor mummies laying around if your program crashes. This is probably a reverse cleanup as described before. For two stages (A / B + C) I'd maybe go with blocks still, but for anything more IMO my proposed goto structure would be cleaner. Similarly I can imagine MPI codes to have some cases where this is relevant. But I admit it's very corner-y (especially since A's memory requirements should be very small, so it would be cleaner to read it completely in one step at first for almost all use cases). But that's the thing, if you do find a corner case that breaks a law, then the law is weakened.

rouson commented 7 years ago

On Oct 23, 2016, at 10:19 PM, Michel Müller notifications@github.com wrote:

No, it doesn't work anymore now. You still call get_ressource_c(a, b) even though b may be uninitialised. You'd have to an inline branch in front of every line if you don't want nested branches - but that only makes it worse. I strongly advocate against inline branches - like in C with braces, a Fortran branch should always be followed by then ... end if. That to me is more important than avoiding goto. I don't really get what's obscure about a forward-only goto with a well named label to be honest. It's like a subroutine call that automatically captures all the scope you currently have.

I have not followed this dialogue in detail and do not have a strong opinion about the use of GOTO (other than that I trust the conventional wisdom of the many capable computer scientists who have argued against it for nearly half a century — notwithstanding some well-considered arguments to the contrary). Suffice it to say that an argument for GOTO at this point would have to be pretty well fleshed-out to convince most people of its value. It certainly wouldn’t suffice to describe it as being like a subroutine in the sense of “captur[ing] all the scope you currently have.” At best, such a statement is only true if the subroutine is internal to the given scope and even then the subroutine only "captures all the scope you currently have" in the limited case in which the subroutine defines no entities that have the same name as entities in the outer scope. I include one example at the bottom of this reply. And that’s just the tip of the iceberg. I’m sure it would only take a moment to concoct numerous other ways in which a labeled GOTO differs markedly (and confusingly) with a subroutine.

Damian

$ cat sub.f90 program main implicit none logical :: i_am_true=.true. call foo contains subroutine foo logical :: i_am_true=.false. print *, "Or maybe not: ",i_am_true end subroutine end program $ gfortran sub.f90 $ ./a.out Or maybe not: F

LadaF commented 7 years ago

If you like old articles, you might also like: Frank Rubin - “GOT0 Considered Harmful” Considered Harmful http://web.archive.org/web/20090320002214/http://www.ecn.purdue.edu/ParaMount/papers/rubin87goto.pdf

On 23 October 2016 at 20:38, Stefano Zaghi notifications@github.com wrote:

@jacobwilliams https://github.com/jacobwilliams Thank you very much for recalling to me that reference!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Fortran-FOSS-Programmers/Best_Practices/issues/15#issuecomment-255609197, or mute the thread https://github.com/notifications/unsubscribe-auth/AAskEV0DaEXTEJ6K06uK5Mo2csvQ9RBGks5q27eggaJpZM4KdAA_ .

jeffhammond commented 7 years ago

At least in C, goto is used primarily for exception handling.

Wasn't a botched goto statement the cause of heartbleed or some other major recent exploit/bug?

No. See See http://www.dwheeler.com/essays/apple-goto-fail.html#goto for details.

LadaF commented 7 years ago

I didn't mean anything fancy just something like ... if (error_found) goto 999 ...

    ...
    return

  999 print error message and/or set flags

end subroutine

On 21 October 2016 at 11:39, Stefano Zaghi notifications@github.com wrote:

@LadaF https://github.com/LadaF

Dear Vladimir, thank you very much for your insight. Your point about exceptions (or more general concept like state machine algorithm) has been also highlighted in the google thread.

My understanding for this kind of application is very limited due to the very few cases where I faced with something similar to it.

I agree that in very corner cases the ability to jump elsewhere is very flexible, but the cons coming with goto are not balanced by the pros, i.e. the gain in flexibility. For me, similar state machine or exceptions handling can be achieved with other constructs, e.g. block+exit, or implemented with a more complex state pattern if the case is worth to adopt such a more complex and costly approach.

However, to improve my skills, can you give me some references where I can learn more about exceptions handling by means of goto? I you have not references, do no worry, I do not want you waste your time trying to rolling down an example for me.

Thank you again.

Cheers.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Fortran-FOSS-Programmers/Best_Practices/issues/15#issuecomment-255349870, or mute the thread https://github.com/notifications/unsubscribe-auth/AAskEaFM2ODeuHKloISVuYdPiySXkYWiks5q2JZrgaJpZM4KdAA_ .

LadaF commented 7 years ago

Incidentally, this question asked today contains a lot relevant for this topic http://softwareengineering.stackexchange.com/questions/334417/what-kind-of-bugs-do-goto-statements-lead-to-are-there-any-historically-signi

Vladimir

On 24 October 2016 at 10:28, Stefano Zaghi notifications@github.com wrote:

Dear Michel,

thank you very much!

I was coming to this from a C programmer perspective and must admit I didn't look too much into goto/labels in Fortran. So far I never actually needed it, but I'm about to implement an automatic memory pool (using my source-to-source translator), so this kind of resource management was somewhere in the front of my mind. I agree that purely numeric labels are extra nasty - I have the suspicion that this failure of modernisation is another victim of Dijkstra's essay. Under those circumstances I'd probably employ the uppercase F90-style preprocessor and replace the numbers with labels in order to make the intent more clear.

Wonderful, your perspective is really relevant. I am a very poor Fortraner, with a very very poor understanding of C. I think I can learn a lot from you experience.

Apart from that I have to say I don't really follow what you wrote. At the end we are always talking about jumps. Branches, loops, cycle- and exit statements as well as gotos are all jumps. They are all clearly defined through their syntax, none of them is per se more chaotic than the other. They can all be abused to produce spaghetti or any other kind of pasta code.

I disagree. With goto I can jump everywhere, likely I can jump up then down, then up-up, in the middle and then down-down... with other constructs I have to be strictly structured. All constructs can be abused, in this field you can consider me a master... I am able to make complex and obscure even the most simple branch... my bad! However, the possibility that normal or good Fortraners can make a select case branch more clear than an equivalent goto is high. In other words, goto is more spaghetti-code-prone than others, thus if it does not really offer me more, why I should use it?

To me, 2 main reasons are generally used to promote goto

  1. goto is the fastest in the west;
  2. goto is necessary for some corner cases.

I think the first sentence is generally easy to be demystified, while the second is really more interesting (and hopefully true). You are in the right way to provide me a good example of such a corner case, but the previous pseudo-Python code is not enough to convince me.

Throughout my professional life as a software engineer I've learned that nested branches lead to some of the most unmaintainable codes.

I agree. When I studied @rouson https://github.com/rouson book, I find that many of such cases, mainly due to my abuse of nested branches, can be avoided with a little of OOD (if I can diverge little from pure imperative).

For that matter I'm basically employing the following overarching rules when writing purely imperative code:

  • is the code 'hot'? If not, what's the right balance in splitting off subroutines vs. having a good overview over what's going on? E.g. if any kind of expensive resources are used in a process, I like to see any non-reusable init/cleanup code in one wrapper subroutine and split off the actual computations (process).

I agree, but I firstly adhere to the KISS principle, so the I like to wrap/split the init/cleanup code into procedure(s) contained into the processor unit, there should be not overhead in using call cleanup with respect ! start inline cleanup.... If I understand right this point (probably not), the scenario should be:

subroutine process()! init expensive resourcescall get_resouce(a)if (a) thenallocate(..)else returnendif! do the processor's workdo i=1, N ! heavy work if (error) then call handle_errors() goto 10 endifenddo! clean up10 deallocate(...)endsubroutine process

In this case, I see no particular problem to split the logic

subroutine process()call initializeif (is_initialized) call work call cleanendifcontains subroutine initialize ... end subroutine initialize subroutine work ... end subroutine work subroutine clean ... end subroutine cleanendsubroutine process

  • Is there any code repeating itself? -> map to a loop.

I agree, and I add map to a loop calling procedures wrapping the logic 😄

  • Are there branches based on the state of the inputs and do we omit the inner computations based on this state? -> Never nest those branches, instead return early.

I agree in general, but it is difficult to make it concrete. In general I address this with return (into wrapping procedure) or exit (into inline blocks), not with goto.

+Within the loops, do the same logic with branches, use cycle and exit instead of return (or use the branch continuation condition). Almost never have more than one level of branches within the loop (only when it serves code deduplication, i.e. DRY is more important than avoiding nesting).

I agree.

Now, for 99.9% of imperative code, this is already enough. This leads to a code style that tries to be as flat as possible, so that reflecting about its operations becomes as easy as possible. When a state leads to a return/break/continue, I can basically delete that state from my mental stack and explore the next state. Does that make sense?

Absolutely yes, I like it, I like the flat as possible logic, but to me goto makes this more difficult, whereas other constructs let poor-coders like me to obtain a more sane code.

If I have to go find the else-condition somewhere 100 lines further (and hope I hit the correct intendation level) and then jump back when analysing, it's way more difficult to review a code.

I agree, but you should not compare the branches in a so general meaning, you should design a concrete example where goto is employed and let me to try to find an alternative to goto. When we have the two examples side by side we can try to analyze pros and cons of both. In my limited experience I never coded 100-lengthy if-elseif branch, or simply branches longer as 100 SLOCs, but I doubt that in this case goto can make my code more clear... I got lost with just 2 gotos 😄

Cheers.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Fortran-FOSS-Programmers/Best_Practices/issues/15#issuecomment-255691681, or mute the thread https://github.com/notifications/unsubscribe-auth/AAskEZwSoBAVqp-yIkbqB-lctPPLF3ePks5q3HpMgaJpZM4KdAA_ .

LadaF commented 7 years ago

The goto was involved in the Heartbleed but it wasn't the cause.

Regarding the style, I am normally able to use return instead of goto, but they are in fact very similar. Dne 21. 10. 2016 17:27 napsal uživatel "Izaak Beekman" < notifications@github.com>:

At least in C, goto is used primarily for exception handling.

Wasn't a botched goto statement the cause of heartbleed or some other major recent exploit/bug? I only point this out to say that even the experts can mess it up. (However, IIRC the real issue was a missing semi-colon or something mundane like that... gotta love C...) Anyway, my point is that I agree that it is not best practice to use goto, but that there can be exceptions.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Fortran-FOSS-Programmers/Best_Practices/issues/15#issuecomment-255421348, or mute the thread https://github.com/notifications/unsubscribe-auth/AAskES-5GxpcIvIhx804AoP1eYogvwbkks5q2OgLgaJpZM4KdAA_ .

jeffhammond commented 7 years ago

@jeffhammond https://github.com/jeffhammond

It's not my parser - it was written approximately 20 years ago, long before I was programming.

Jeff, I know (some of my ex colleagues used it), my refer to your was just because I think you are currently maintaining/improving it, right?

I still work on NWChem but have no intention to rewriting the code to use fewer GOTO statements :smile:

GOTO should be tagged as a bad practice in pedagogical material, because the vast majority of novices will use it wrong. It should not be prohibited by a software style guide, because it is useful if not necessary in very specific cases, which will be written by experts.

I got it for our guide, as always I'll to create a wording that is a trade-of of our feelings.

However, I (currently) disagree: to me also experts should avoid it. Indeed, I am just searching for examples of special/corner cases where goto is necessary as you alluded to.

Almost nothing in programming languages is truly necessary. See BF for a set of necessary programming constructs. :smile:

By definition, I trust experts to use features of a programming language as much or as little as necessary. As I said before, prohibiting GOTO is inappropriate. Discouraging confusing control flow in general is a great recommendation. Using GOTO to jump in and out of 5000-line if-elif-else-endif construct - as NWChem used to do in src/tce/tce_energy.F - is evil, but GOTO is but a small part of that.

A recommendation to keep GOTO <n> within 50 lines of <n> is great, so long as no one ever adds code in between the two :smile:

Another possibility is to document GOTO with a "see also" along the lines of what GCC does on e.g. https://gcc.gnu.org/onlinedocs/libgomp/GOMP_005fSTACKSIZE.html#GOMP_005fSTACKSIZE, which helps programmers find portable substitutes for non-portable constructs. Similarly, Fortran programmers who want to learn about GOTO should be pointed to switch-case.

P.S. I given a flight view to NWChem input parser, I found only 2 goto, but they are enough to make me lost when tried to follow flow-pattern

There are 9100 instances in the current source tree :wink:

The following seems clear enough to me, but of course is not necessary. One could use e.g. switch-case instead.


!

! ----------

! Read input

! ----------

!

   10 if (.not. inp_read())

     1  call errquit('tce_input: failed reading input',0,

     2  RTDB_ERR)

      if (.not. inp_a(test))

     1  call errquit('tce_input: failed reading keyword',0,

     2  RTDB_ERR)

SNIP


!

!     END

!

      else if (inp_compare(.false.,test,'end')) then

        goto 20

      else

        call errquit('tce_input: unknown directive',0,INPUT_ERR)

      endif

      goto 10

!

! ------

! Return

! ------

!

   20 return

      end
muellermichel commented 7 years ago

I think the question for this thread should be how the guide around goto should be worded. There's a spectrum of possibilities, heres my take on that:

1.) Never use goto in Fortran programs, period.

2.) It is strongly recommended to avoid goto for maintainability reasons. Refer to (A) for typical cases where programmers have used goto and how it can be replaced without performance penalty. [1]

3.) Avoid goto except for very narrowly defined usecases (B). Refer to (A) for typical cases where programmers have used goto and how it can be replaced without performance penalty.

(A) tbd.

(B) tbd.

Personally I'm still undecided between position (2) and (3). To me an error handler as outlined by @LadaF wouldn't fall under (B) because it can easily be avoided using a Block, which to me is a cleaner construct. Now that being said, according to fortranwiki it is still not supported by PGI, which I have to rely on for GPU compatibility reasons. So there still seems to be a tradeoff between portability, code complexity and maintainability (goto when used properly can IMO lead to less code complexity, but is problematic for maintainability reasons, which includes novices seeing it and using it in the wrong way). Maybe (B) could spend a word on compiler support?

[1] footnote: I think it's important to get rid of the notion that avoiding goto is hurting performance per se. Therefore I advocate against using subroutine calls instead of gotos in the styleguide, instead it should be replaced with Blocks, Switches, Loops and Branches in the examples.

szaghi commented 7 years ago

Dear All,

thank you very much for the great teachings!

I added a first proposal skeleton in the very first post of this thread (if you like I can add it at the end, here). Just few sentences that I hope are a good compromise. Feel free to directly amend it. Later I'll add all the great examples you provided as well as the references.

There is a very nice and well known comic-draw about goto posted also in the stackexchange question linked by Vladimir, do you think it is too much out-of-scope for this guidelines?

Cheers.

jeffhammond commented 7 years ago

I very strongly agree with Knuth when he writes in Structured Programming with go to Statements (PDF):

The reason is that we've really been directing our attention to the wrong issue, to the objective question of go to elimination instead of the important subjective question of program structure. ... The real issue is structured programming... Only one thing is really clear: Structured programming is not the process of writing programs and then eliminating their go to statements.

There are very good reasons to use GOTO, although more so in C than in Fortran, but their goodness is revealed by aspiring to never use GOTO, and then adding it when proven by process of elimination to be necessary.

GOTO is never necessary. However, in some cases, it improves performance and readability, although the latter case is truly exceptional, particularly in Fortran. As for the performance benefits of GOTO, I must again quote Knuth (emphasis mine):

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.

(From The Fallacy of Premature Optimization)

I would cite the Linux Kernel and MPICH as examples of where GOTO is appropriate, but these are both instances of low-level system software written in C that need to implement exception handling or mutex management on the critical path. System software has not been written in Fortran in decades, and Fortran lends itself to optimizations that C does not, so I cannot claim there is an obvious pattern that justifies the use of GOTO in Fortran.

muellermichel commented 7 years ago

@jeffhammond: This is IMO the best summation I've seen so far and I think it supports @szaghi's first draft quite well. Maybe your quotes could be included there in footnotes?

Anyways: This other quote

premature optimization is the root of all evil

is another one of those quotes that have IMO been taken out of context way too much, to the point where I feel queasy whenever I see it. Yes, it applies to the usage of goto, but I also see it used so much even when discussing the overall architecture of a solution. "No, we don't need to opt for this provenly faster method from start, because premature optimization is the root of all evil...". Nevermind that restructuring for said optimisation is going to cost a lot of time and money and you could as well make it fast from start without spending significantly more effort by simply including performance in the overall considerations on how to structure your software.

By that I mean, have a rudimentary understanding on how 'hot' each part of the code you write is, and write it accordingly from start. Do a very simple performance model first, if that helps keeping it on your mind. Otherwise it can go as far as "oh shoot, our CPython software doesn't run fast enough to meet requirements by a factor of 100, let's reimplement everything in C"......