anderkve / FYS3150

https://anderkve.github.io/FYS3150
26 stars 14 forks source link

Getting around "RuntimeWarning: divide by zero encountered in log10" #7

Closed cecilmkl closed 2 years ago

cecilmkl commented 3 years ago

We're struggling a bit with plotting errors as our approximation gets too close to the exact values. Any tips on how to solve this?

evenmn commented 3 years ago

Hi,

the errors are expected to be small (at least for sufficiently large Ns), and it is therefore natural that it is hard to distinguish the exact and approximate solution. If you decide to plot the exact solution as a dashed line, if might be possible to see both lines, but if not I would not worry too much about it :) This is also why you are asked to plot the logarithm of the absolute and relative error in Problem 8.

Hope this helped, Even

anderkve commented 3 years ago

Hi all,

@evenmn, I think @cecilmkl is plotting the logarithm, given that the error message refers to log10 :)

In general, if you encounter the situation like log10(b-a) (with b and a being arrays) where some elements of b and aare so close that b-a is zero (so log10(b-a) would be -infinity for those elements), a simple workaround is to simply set such results to a lowest allowed value. E.g. you could do something like this

y_data = np.maximum(-100, np.log10(b-a))

Then the elements that would previously result in log10(b-a) = -infinity would now be set to -100. Of course, it is then important that you don't go on interpreting/using the number -100 for further computations, since it is just an arbitrary cutoff you chose to avoid a plotting issue.

Alternatively, I suspect that if you don't do anything, and just leave the -inf values in y_data, I think matplotlib will handle this OK. These points will simply be outside the plotted y-axis range, regardless of what y-axis range you choose.

Even if you decide to replace the -inf values with some lowest allowed value, like -100, you probably don't want to include that in your plot since your y-axis presumably should have a shorter range to properly display the values for other points. So your -100 point would just be outside the plotted range.

Regardless of which approach you take, you can always just add a comment in the figure caption explaining why some points fall outside the plotted y range.

Note: Make sure that the reason you get b-a so close to zero is not simply that you have written the results to file with too few digits. At the endpoints we do expect exact agreement between the exact solution and our approximation, independent of the choice of N (why?), so at least for those points you would expect to encounter this runtime warning. But if you see it for other points, I would check to make sure that you have saved your results with sufficient number of digits.

mikkelme commented 3 years ago

Hi, I just wanted to add a comment regarding the statement: "Make sure that the reason you get b-a so close to zero is not simply that you have written the results to file with too few digits.". I saw this case during the group sessions, and I think a great solution is to calculate the relative error in c++ and write that to file instead of calculating later in python. This reduce the problem of round off error as far as I see it.

Mikkel

anderkve commented 3 years ago

Good point, @mikkelme -- that is a good argument for computing the errors directly in the C++ code.

In one of the lectures I ran a demonstration where I only saved the "raw data" from the C++ program, and then computed all errors at "data analysis time", i.e. when I was making the plots in Python. This is typically what one would do in cases with very large datasets, when file sizes become a challenge. But for this project, it certainly would make very good sense to output the errors directly from the C++ code.

dondondooooon commented 3 years ago

I think I encountered this problem or as similar problem as well and struggled for a while. My work around was 2 things:

  1. Increased the precision in c++ code for the data that was to be outputed on to the files (increases the run time sadly), because I saw that when getting on to higher values of N, around N=10^6 or 7, higher precision seemed to reduced the number of divided by zero erros I was getting in python script (I originally got ALOT).

  2. Removed calculations for the start and end values in the python script, because if you just calculated your entire array without this, I realized I was making the computer do log10(b-a) for a start value of 0 both for the approximate and exact value, which naturally gives a divide by zero error. And I asked this in one of the groups, and I think it's fine to do this anyways because we assume the boundary conditions anyways.