Various improvements - Githubissues

icherkashin commented 8 years ago

The purpose of this commit is to remove the error message about empty project version string when calling cmake before compiling the code.

icherkashin commented 8 years ago

Please look at the individual commits to see what line each comment belongs to.

I wanted to offer these changes in separate pull requests, but looks like new commits are added to the existing pull request. Hopefully, you find these changes minor enough to be incorporated into one pull request.

I hope you find these changes useful! Let me know what you think of these.

TedStudley commented 8 years ago

Most changes look good. The OutputStructure has been a complete mess ever since I got it working, but it was always low-priority to come back to it. Nice to see it cleaned up a little bit with things like proper exceptions and using limits.

The few changes I'd like to make (other than the ones I've mentioned in the comments) would be to fix the added trailing whitespace on some of the lines, and ensure that the code style matches what we already have (and what's in the proposed astyle config). This should be a fairly easy fix.

TedStudley commented 8 years ago

I wrote a comment about this somewhere, but I'm not sure where it went. The weird ordering in the loop in main.cpp is there to force the program to output at t=0, before solving advection/diffusion but after the stokesSolve. I'd like to keep this functionality, but agree that the duplicated solve is ugly and redundant. Can you update the changes you've made there to ensure that we continue to output the t=0 timestep?

icherkashin commented 8 years ago

I wrote a comment about this somewhere, but I'm not sure where it went. The weird ordering in the loop in main.cpp is there to force the program to output at t=0, before solving advection/diffusion but after the stokesSolve. I'd like to keep this functionality, but agree that the duplicated solve is ugly and redundant. Can you update the changes you've made there to ensure that we continue to output the t=0 timestep?

Your comment is here. In fact, the file for timestep = 0 is output with this change. I wrote more about this in response to your comment there.

icherkashin commented 8 years ago

The few changes I'd like to make (other than the ones I've mentioned in the comments) would be to fix the added trailing whitespace on some of the lines, and ensure that the code style matches what we already have (and what's in the proposed astyle config). This should be a fairly easy fix.

Of course, although I don't think the astyle file pull request was merged yet.

TedStudley commented 8 years ago

The logic behind outputting timestep t=0 after solving Stokes but before advection/diffusion is that since Stokes isn't directly dependent upon dt, the solution at t=0 actually gives the pressure/velocity values for t=0. Advection/diffusion on the other hand are dependent upon dt, so outputting after applying advection/diffusion would result in having the temperature values for t=dt but the pressure/velocity for t=0. Having laid it out like this, it seems as though our output was also wrong for every t>0, and that we would be outputting the pressure/velocity for time t=t_i but the temperature for time t=t_i + dt... This wouldn't have been caught by any of our benchmarks, since they all either only check the pressure/velocity at t=0 or only check for temperature convergence. Maybe @egpuckett can take a look at this to see which way is correct.

For the trailing whitespace/code style, the astyle config hasn't been merged in yet, but it shouldn't be too difficult to ensure that all changes preserve the pre-existing style and don't add any trailing whitespace. If you need the config in order to do that, feel free to grab it from the pull request which adds it, although I'd rather not apply any style changes to code which hasn't already been touched in this pull request. I'd suggest enabling the option to display trailing whitespace in whatever editor you use, since it makes whitespace errors way easier to catch.

Also, just a note: if you end up with a bunch of intermediate commits which break the build system, feel free to squash them all together with rebase. You'll need to force push to your branch, but that's only an issue if other people also have that branch checked out. For a pull request, it's generally pretty safe to to.

icherkashin commented 8 years ago

The logic behind outputting timestep t=0 after solving Stokes but before advection/diffusion is that since Stokes isn't directly dependent upon dt, the solution at t=0 actually gives the pressure/velocity values for t=0. Advection/diffusion on the other hand are dependent upon dt, so outputting after applying advection/diffusion would result in having the temperature values for t=dt but the pressure/velocity for t=0. Having laid it out like this, it seems as though our output was also wrong for every t>0, and that we would be outputting the pressure/velocity for time t=t_i but the temperature for time t=t_i + dt... This wouldn't have been caught by any of our benchmarks, since they all either only check the pressure/velocity at t=0 or only check for temperature convergence. Maybe @egpuckett can take a look at this to see which way is correct.

Yes, perhaps you are right because I don't know how you designed your program initially. But that's why I suggest we restructure the main loop for solving Stokes for now because we're working on fixing the indexing issue for Stokes solver first. Although this is rather minor, there is an extra Stokes solver call with the current code, which slows down testing Stokes solver.

In fact, we don't even have a clear plan yet for how to perform advection. That's why you are right that Prof. Pucket should take a look at this and give us his recommendation. Furthermore, I think there is no understanding yet how to do advection in terms of how many Stokes solutions data are needed - one or two. Of course, one is preferable, but there is no understanding of the algorithm of how to accomplish that.

To summarize, from talking to Prof. Puckett, I understood that advection is still an open problem for us to solve, so there is no point to structure the main loop to work for advection. I hope Prof. Puckett reads this and gives us his opinion.

icherkashin commented 8 years ago

For the trailing whitespace/code style, the astyle config hasn't been merged in yet, but it shouldn't be too difficult to ensure that all changes preserve the pre-existing style and don't add any trailing whitespace. If you need the config in order to do that, feel free to grab it from the pull request which adds it, although I'd rather not apply any style changes to code which hasn't already been touched in this pull request. I'd suggest enabling the option to display trailing whitespace in whatever editor you use, since it makes whitespace errors way easier to catch.

Sounds good! Thank you for the advice with trailing spaces, I haven't been noticing them before!

Also, just a note: if you end up with a bunch of intermediate commits which break the build system, feel free to squash them all together with rebase. You'll need to force push to your branch, but that's only an issue if other people also have that branch checked out. For a pull request, it's generally pretty safe to to.

Thank you! That will be wise to do next time.

TedStudley commented 8 years ago

The primary purpose of the code is to be a mantle convection simulator (it's in the name: the mc in mc-mini stands for mantle convection). That means that advection/diffusion is a core part of the algorithm, and should stay that way. We currently have working (at least first, if not second-order) advection/diffusion, even though it may not have all of the properties that we would like it to. Those properties come as a result of the implementation of the advection/diffusion algorithm, not the main loop where that algorithm is called. If we decide to implement an advection/diffusion algorithm which requires additional Stokes solves, those solves should be performed from inside of the advection/diffusion algorithm, not in the main loop. The only way that the main loop is specifically structured to work for advection/diffusion is because it's even present at all. In fact, depending upon certain parameters, advection/diffusion may never even be performed.

It's fine to make changes which facilitate testing the Stokes solver, but those changes really shouldn't be merged into master if they significantly change pre-existing functionality. In fact, rather than making any changes in main() to facilitate testing, I would far rather Stokes Solver testing be implemented as a GTest unit test. This would (ideally) require a way of mocking parameters instead of reading a parameter file, but would end up being a much more useful solution than manual testing.

icherkashin commented 8 years ago

We currently have working (at least first, if not second-order) advection/diffusion, even though it may not have all of the properties that we would like it to.

I understand what you are saying, but I think (Prof. Puckett should confirm my words, however) we are going to completely discard the existing advection code. So, basically, so far this code is only solving Stokes equations (even that requires work on swapping indices, which requires deeper change than I originally thought).

The only way that the main loop is specifically structured to work for advection/diffusion is because it's even present at all.

Since I presented the execution logic I think is (or should be) behind the main loop, I think Prof. Puckett should tell us his opinion on this issue.

It's fine to make changes which facilitate testing the Stokes solver, but those changes really shouldn't be merged into master if they significantly change pre-existing functionality.

I don't think anything here significantly changes per-existing functionality since the existing advection implementation is not considered working. On the other hand, I understand your point that there is something working, that better stay in place (i.e. do no harm until new solution is ready). This is wise.

On the other hand, the existing logic of the main loop is incomprehensible, at least in my understanding, so keeping it doesn't tell anything about what is or should be taking place in the code. However, because you designed this initially, you may see this logic better, of course.

It'd be really good to hear Prof. Puckett's opinion on this. In any case, the structure of the main loop is not that important at this point, so I we don't need to merge it into the main branch yet, except the changes concerning removing setNbThreads(16) and throwing an exception instead of exiting.

Would that be a good compromise for now? I'd still be important to hear from Prof. Puckett because this is a very useful discussion about the main loop.

TedStudley commented 8 years ago

I'm afraid I don't understand how the current advection implementation would be considered non-working... If temperature is advected in the process of calling solveAdvectionDiffusion, then advection is working by definition. Unless I'm mistaken, we should have working first and second-order advection methods in upwindMethod and frommMethod, respectively. What we don't have is a working overshoot-free second-order advection method. Since the code was designed to allow multiple advection and diffusion routines to be implemented and the implementation for a run to be chosen through parameters, I would consider this more as an incomplete additional feature for advection.

Maybe I can explain a little more of the logic behind the original implementation of the contents of main:

problem.initializeProblem();

problem.updateForcingTerms();
problem.solveStokes();

problem.recalculateTimestep();

First we initialize the problem and compute some of the initial values. At this point, we should have the state of the domain (temperature, pressure, velocity, forcing, viscosity, etc...) at time t=t_0. We update the forcing terms and solve the Stokes equation here since they are dependent solely upon the values which are already present in the domain, and not upon t or d_t. If we consider the ideal gas pressure equation P=nRT/V (not immediately applicable here for obvious reasons, but a good example nonetheless). Because of conservation (very important here!), the only non-constant value on the right-hand-side of the equation is T. Because of this, we are able to calculate the pressure at a specific point in time solely using the value of the temperature field at that time.

This is doubly necessary here because the value of d_t is computed based on the maximum velocity across the domain (in an advection-dominated problem) or the cfl/diffusivity (in a diffusion-dominated problem). In every case, we need to compute these values before we can even consider running advection or diffusion.

do {    
  output.writeHDF5File (problem.getTimestepNumber());
  cout << "<Timestep: " << problem.getTimestepNumber() << "; t=" << problem.getTime() << ">" << endl << endl;

This is likely the portion which is causing confusion. At this point in the loop, we output the values from the beginning of the current timestep (or the end of the previous timestep). I think this is a hold-over from when I was breaking before the loop instead of after (I found behavior that would require it in commit 692d0230), and it would have been possible to completely skip the entire output routine.

  problem.updateForcingTerms();
  problem.solveStokes();
  problem.recalculateTimestep();

This is the main issue with what I was doing before: we have to update the domain twice in the first timestep to output properly. Additionally, we don't appear to output the result of the last timestep inside of the loop

  problem.solveAdvectionDiffusion();
} while (problem.advanceTimestep());
output.writeHDF5File (problem.getTimestepNumber());
cerr << "Timestep: " << problem.getTimestepNumber() << "; t = " << problem.getTime() << endl;

output.writeHDF5File();

Last thing is to output the last timestep and then create the XDMF file which provides metadata for all of the HDF5 files.

It should be possible to rewrite the loop to keep the pre-existing functionality but without solving stokes/updating forcing/recalculating timestep twice in the first step.I have to go, but let me know if there are any concerns that this hasn't cleared up.

egpuckett commented 8 years ago

Hi Everyone, I'm recovering from a (minor) back surgery I had today. I hope to be able to look at this and throw my two cents in late tomorrow afternoon. Cheers!

egpuckett commented 8 years ago

The logic behind outputting timestep t=0 after solving Stokes but before advection/diffusion is that since > Stokes isn't directly dependent upon dt, the solution at t=0 actually gives the pressure/velocity values ? > for t=0. Advection/diffusion on the other hand are dependent upon dt, so outputting after applying advection/diffusion would result in having the temperature values for t=dt but the pressure/velocity for t=0. Having laid it out like this, it seems as though our output was also wrong for every t>0, and that we would be outputting the pressure/velocity for time t=t_i but the temperature for time t=t_i + dt... This wouldn't have been caught by any of our benchmarks, since they all either only check the pressure/velocity at t=0 or only check for temperature convergence. Maybe @egpuckett can take a look at this to see which way is correct.

The logic Ted has outlined here is indeed not consistent. The solution should be output after one Stokes solve and one advection step tⁿ - -> tⁿ⁺¹.

However, I do think it make sense to output the initial conditions BEFORE the first Stokes solve. This is something that can help one ensure that the problem has been set up correctly.

Does this make sense? - G

egpuckett commented 8 years ago

I'm not certain what the discussion concerning the advection method or advection followed by diffusion method is about.

BTW I found this discussion #19 to be very confusing to follow, since it included issues related to output, threads, white space, etc. I would have been able to follow the discussion concerning the logic in main.cpp if it was a separate discussion entirely. Isn't there a way to do this.

One other comment (no pun intended!) I think there should be copious comments in main.cpp; literally explaining which part of the algorithm each step in main.cpp is performing. This can serve as the first overview for a new user. Perhaps outputting the comments in Doxygen. I know we haven't finished with the monotone advection / diffusion yet, but that fact can be put in the comments also. In fact, Eclipse has a // TODO feature that places all of the // TODO's in a separate 'Tasks" window, which I find handy.

TedStudley commented 8 years ago

Hey Gerry, hope you're feeling better!

As it stands, the logic that we currently have (in pseudocode) is the following:

initialize the problem
solve stokes and recalculate forcing
recalculate timestep
do {
    output the current state
    solve stokes and recalculate forcing
    recalculate timestep
    calculate advection/diffusion
    update the current time
} while we haven't met the termination criteria
output the current state

This has two major issues: first, as @icherkashin has pointed out, we're solving Stokes twice on the first timestep, which really isn't necessary. Secondly, for every time t_n after the first timestep n=0, we end up outputting the temperature for time t=t_n but the pressure/velocity for time t=t_{n-1}. I'd like to propose that we update this main loop to work as follows instead:

initialize the problem
do {
    solve stokes and recalculate forcing
    recalculate timestep
    output the current state
    calculate advection/diffusion
    update the current time
} while we haven't met the termination criteria
solve stokes and recalculate forcing
output the current state

This would remove the duplicate Stokes solve from the first timestep, and maintain the behavior that we output the values for time t=0, as well as ensuring that the temperature/pressure/velocity data being output is all for the same time. If you want to implement this, we can get two birds with one stone and get this pull request finally merged in.

TedStudley commented 8 years ago

@egpuckett There's no way to create multiple threads for a single pull request, but that's usually not a problem since each pull request is generally supposed to be address one issue (or a set of similar issues). I think the difficulty with reading this thread steps from this pull request handling a number of different issues all at once, while we really had enough changes to create a couple of different pull requests.

Another added benefit there is that changes which there are some contention on don't hold back changes which can be easily merged in. The Travis CI fix has been working and ready to merge in for days, but it's being held back so that we can get the rest of the changes in the pull request worked out.

As for commenting, I've gotten Doxygen mostly working, and we should fairly easily be able to render comments from the source code into formatted documentation. If I remember correctly, there's also a feature in Doxygen to list all of the TODO comments in the code, as well.

icherkashin commented 8 years ago

@egpuckett There's no way to create multiple threads for a single pull request, but that's usually not a problem since each pull request is generally supposed to be address one issue (or a set of similar issues). I think the difficulty with reading this thread steps from this pull request handling a number of different issues all at once, while we really had enough changes to create a couple of different pull requests.

Alternatively, you can make line comments, although that may become hard to read as well after several comments.

icherkashin commented 8 years ago

This would remove the duplicate Stokes solve from the first timestep, and maintain the behavior that we output the values for time t=0, as well as ensuring that the temperature/pressure/velocity data being output is all for the same time. If you want to implement this, we can get two birds with one stone and get this pull request finally merged in.

I implemented your idea in the latest commit. I think this is sufficient for now when it comes to the main loop structure.

One other comment (no pun intended!) I think there should be copious comments in main.cpp; literally explaining which part of the algorithm each step in main.cpp is performing. This can serve as the first overview for a new user. Perhaps outputting the comments in Doxygen. I know we haven't finished with the monotone advection / diffusion yet, but that fact can be put in the comments also. In fact, Eclipse has a // TODO feature that places all of the // TODO's in a separate 'Tasks" window, which I find handy.

I added some comments in the latest commit. Feel free to change them if you find that they don't reflect something accurately.

icherkashin commented 8 years ago

Another added benefit there is that changes which there are some contention on don't hold back changes which can be easily merged in. The Travis CI fix has been working and ready to merge in for days, but it's being held back so that we can get the rest of the changes in the pull request worked out.

That's a good point. I had organizational problems with these changes in general, but now I know what to avoid next time. I truly appreciate your comments and suggestions!

TedStudley commented 8 years ago

Much nicer. We really don't need to comment every line in main (#include directives should already be pretty self-explanatory from the included file name), and we need to document why the loop is structured the way it is, but I can fix that when I finally get around to documenting the whole code in depth (See issue #11. I should get to it soon, I swear...).

In future, let's try to keep one concern (or type of concern) per pull request. Pull requests are cheap, and it should help us to keep things moving and also prevent the comment chains from being overly confusing.

icherkashin commented 8 years ago

We really don't need to comment every line in main (#include directives should already be pretty self-explanatory from the included file name),

That's true. On the other hand, if Prof. Puckett meant that he may be working with new students that are unfamiliar with C++, that may be helpful. There is a trade-off to be made here, of course.

TedStudley / mc-mini

Various improvements #19