bgrimstad / splinter

Library for multivariate function approximation with splines (B-spline, P-spline, and more) with interfaces to C++, C, Python and MATLAB
Mozilla Public License 2.0
419 stars 115 forks source link

Very slow build of the bspline object #133

Open hlucian opened 1 year ago

hlucian commented 1 year ago

Hi everyone

I integrated the splinter library into an inhouse cfd software for testing with lookup-table methods. As I have tables for various thermophysical quantities (e.g. rho = f(p,h), ....) I create an BSpline object for each of these tables.

splinterPtr_ = make_unique<SPLINTER::BSpline>(SPLINTER::BSpline::Builder(samples).degree(order_).build());

Originally, I was using an internal implementation for table interpolation using tables with a resolution if 1000x1000, but I had to realize this seams to be too fine for BSpline. However, even when using a relatively coarse table of 200x200, the command shown above takes several minutes to complete. Is this a normal behavior? Any options to accelerate this process?

Kindly Lucian

bgrimstad commented 1 year ago

Hi Lucian,

Glad that you may have a use case for SPLINTER.

I did a quick test and on my computer a cubic (degree 3) B-spline interpolates a 200x200 table in approximately 10 seconds. I ran this single-threaded on an Intel Core i7-8700K CPU. Note that I used the Python interface and the compiled source in develop-branch. We haven't released this version, but if I remember correctly it should have equal performance to the latest release.

Regarding interpolation of larger tables:

Please keep us posted!

Bjarne

EDIT: The above example used a version of SPLINTER compiled to debug mode. In release mode, building the B-spline takes less than 0.5 seconds.

hlucian commented 1 year ago

Hi Bjarne

Thank you for the quick feedback. For me, the 200x200 tables take about 4 minutes. When using a resolution of 1000x1000, i kept it running over the night, but it did not finish. The degree order is set to 2. This is true for all thermophysical tables.

The full call to the splinter library looks as follows:

    SPLINTER::DataTable samples;
    SPLINTER::DenseVector x(2);
    double y;

    for(int i = 0; i < nX_; i++)
    {
        for(int j = 0; j < nY_; j++)
        {
            // Sample function at x
            x(0) = X_coordinate_[i];
            x(1) = Y_coordinate_[j];
            y = Value_[i][j];

            // Store sample
            samples.addSample(x,y);
        }
    }
    auto start = chrono::steady_clock::now();
    if(RANK==0) cout << "Building splinter interpolation object" << endl;
    splinterPtr_ = make_unique<SPLINTER::BSpline>(SPLINTER::BSpline::Builder(samples).degree(order_).build());
    auto end = chrono::steady_clock::now();
    auto diff = chrono::duration_cast<chrono::seconds>(end - start).count();
    if(RANK==0) cout << "Finished building splinter interpolation object in " << diff << "seconds" << endl;

The output looks as follows for the table h = f(p,T):

===============================
 Creating ThermoTabulated Object
           pTh
===============================
=======================
reading tabular data...
Table: pTh
=======================

Entered Table
-------------------------------------
Table type = pTh
xMin_      = 1.00000000000000e+04
xMax_      = 7.00000000000000e+07
yMin_      = 1.40000000000000e+01
yMax_      = 1.00000000000000e+02
valueMin_    = -5.35947000000000e+04
valueMax_    = 1.51751000000000e+06
-------------------------------------

Building splinter interpolation object
Finished building splinter interpolation object in 231seconds

The material I am looking at is liquid hydrogen in the vicinity of the two phase region. Hence, there are some considerable jumps in the properties when passing from the liquid to the gaseous region. In case, I can also provide you the table including a small Matlab routine to visualize the data in a plot. The output looks as follows for e.g. the density = f(p,h)

visitedRegion

You mentioned to use a different solver from the Eigen-library. How can I achieve this? Is there a function to select a different solver? Which one would you suggest?

Also the tables usually never change. So yes, it would absolutely make sense to store the bspline. Also here, how can I save and load them from within my c++ code? Although, as said above 1000x1000 is currently not working for me.

Thank you so much Kindly Lucian

hlucian commented 1 year ago

Hi Bjarne

Sorry for the new message, but I did some further testing and it does not look like it is related to my tables. In the folder that I cloned from your git repository, there is a folder called test. I compiled the sources as shown below using two different levels of optimization.

g++ *.cpp ../src/*.cpp general/*.cpp -I../thirdparty/Catch -I. -I../include -I../thirdparty/Eigen/ -O3 -o outputO3
g++ *.cpp ../src/*.cpp general/*.cpp -I../thirdparty/Catch -I. -I../include -I../thirdparty/Eigen/ -O0 -o outputO0

The executable is then started using:

./outputO3 [bspline][general][knotinsertion] |tee 2>&1 logO3
./outputO0 [bspline][general][knotinsertion] |tee 2>&1 logO0

I modified bsplinetestingutilities.cpp so that it creates a table of 200x200 instead 20x20 and added some runtime statements as shown below:

DataTable sampleTestFunction()
{
    DataTable samples;

    // Sample function
    auto x0_vec = linspace(0,2,200);
    auto x1_vec = linspace(0,2,200);
    std::cout << x1_vec.size() << std::endl;
    DenseVector x(2);
    double y;

    for (auto x0 : x0_vec)
    {
        for (auto x1 : x1_vec)
        {
            // Sample function at x
            x(0) = x0;
            x(1) = x1;
            y = sixHumpCamelBack(x);

            // Store sample
            samples.addSample(x,y);
        }
    }

    return samples;
}

bool testKnotInsertion()
{
    DataTable samples = sampleTestFunction();

    // Build B-splines that interpolate the samples
    std::cout << __LINE__ << std::endl;

    auto start = std::chrono::steady_clock::now();
    BSpline bspline1 = BSpline::Builder(samples).degree(1).build();
    auto end   = std::chrono::steady_clock::now();
    std::cout << "Runtime 200x200 first order: " << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << " milliseconds" << std::endl;
    start = std::chrono::steady_clock::now();
    std::cout << __LINE__ << std::endl;
    BSpline bspline2 = BSpline::Builder(samples).degree(2).build();
    end   = std::chrono::steady_clock::now();
    std::cout << "Runtime 200x200 second order: " << std::chrono::duration_cast<std::chrono::seconds>(end - start).count() << " seconds" << std::endl;
    std::cout << __LINE__ << std::endl;
    start = std::chrono::steady_clock::now();
    BSpline bspline3 = BSpline::Builder(samples).degree(3).build();
    end   = std::chrono::steady_clock::now();
    std::cout << "Runtime 200x200 third order: " << std::chrono::duration_cast<std::chrono::seconds>(end - start).count() << " seconds" << std::endl;
    std::cout << __LINE__ << std::endl;

The output is the following and shows that even when using your testing function, the build commands takes around 3min.

./outputO0 [bspline][general][knotinsertion] |tee 2>&1 logO0
200
51
BSpline::Builder::computeBSplineCoefficients: Computing B-spline control points using sparse solver.
Runtime 200x200 first order: 927 milliseconds
58
BSpline::Builder::computeBSplineCoefficients: Computing B-spline control points using sparse solver.
Runtime 200x200 second order: 183 seconds
62
BSpline::Builder::computeBSplineCoefficients: Computing B-spline control points using sparse solver.
Runtime 200x200 third order: 186 seconds
67

Can you tell me whether you encounter the same behavior on your side? My CPU is a Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz

bgrimstad commented 1 year ago

Thank you for posting your test results.

When inspecting your output, I noticed that SPLINTER prints the following line "BSpline::Builder::computeBSplineCoefficients: Computing B-spline control points using sparse solver." I think this means that SPLINTER has been compiled to debug mode. Perhaps you can try to compile to release mode by adding the flag -DNDEBUG.

bgrimstad commented 1 year ago

Another follow-up. I went back to check if had tested SPLINTER in debug mode myself, and indeed I had. I made an edit to my previous comment to mention this. When the sources are compiled to release mode, the 200x200 grid interpolation takes less than 0.5 seconds and the 1000x1000 grid interpolation takes around 100 seconds.

hlucian commented 1 year ago

Hi Bjarne

Thank you for your inputs, I recompiled with the -DNDEBUG flag, however, the change is not as you mention in my case. The results you get, I can only achieve if I set the degree to 1. As soon as I switch to 2 or 3, it takes about 100 seconds already on the 200x200 table. Did you achieve your results on the sixHumbCamelBack function? Otherwise, can you provide me your test setup and how you compile the source?

I tried the following:

  1. Without DNDEBUG, no optimization (-O0) g++ .cpp ../src/.cpp general/*.cpp -I../thirdparty/Catch -I. -I../include -I../thirdparty/Eigen/ -O0 -o output 1.1 degree = 1, 200x200 -> 0.56 seconds 1.2 degree = 2, 200x200 -> 180 seconds
  2. Without DNDEBUG, optimization (-O3) g++ .cpp ../src/.cpp general/*.cpp -I../thirdparty/Catch -I. -I../include -I../thirdparty/Eigen/ -O3 -o output 2.1 degree = 1, 200x200 -> 0.1 seconds 2.2 degree = 2, 200x200 -> 100 seconds
  3. With DNDEBUG, optimization (-O3) g++ .cpp ../src/.cpp general/*.cpp -I../thirdparty/Catch -I. -I../include -I../thirdparty/Eigen/ -DNDEBUG -O3 -o output 3.1 degree = 1, 200x200 -> 0.09 seconds 3.2 degree = 2, 200x200 -> 100 seconds
  4. With DNDEBUG, optimization (-O3), vectorization (-msse2) g++ .cpp ../src/.cpp general/*.cpp -I../thirdparty/Catch -I. -I../include -I../thirdparty/Eigen/ -DNDEBUG -O3 -o output 4.1 degree = 1, 200x200 -> 0.09 seconds 4.2 degree = 2, 200x200 -> 100 seconds

Again, thank you a lot for your time and assistance Kindly Lucian

bgrimstad commented 1 year ago

I am building with CMake using the configuration in CMakeLists.txt. My quick test was a modified version of the Python-example in https://github.com/bgrimstad/splinter/blob/develop/python/examples/bspline_multivariate.py. This test function is simpler than the six-hump camel back function, but I do not think that explains the high runtime of your test.

I will try to get some assistance from @gablank, who made SPLINTERS compiler setup.

hlucian commented 1 year ago

That would be great. We believe that having smooth interpolation of our table data could help to stabilize our cavitation predictions. However, as it is currently the case, it takes about an hour to read all the data, and I am not even sure if the 200x200 resolution is not too coarse anyway.

hlucian commented 1 year ago

Another follow up from my side. I switched to the python example you mentioned and increased the resolution to 200x200. On my side it takes 36 seconds, did I do something wrong when compiling splinter with CMake? Here is what I changed in bspline_multivariate.py

x1 = np.arange(-5, 5, 0.05)
x2 = np.arange(-5, 5, 0.05)

And for the monitoring of time

start_time = time.time()

bspline = splinter.BSplineBuilder(x, y, degree=[1, 3], smoothing=splinter.BSplineBuilder.Smoothing.NONE).build()
print("--- %s seconds ---" % (time.time() - start_time))

Gives me the following output

python3.6 bspline_multivariate.py
Loaded SPLINTER from /usr/local/lib/libsplinter-3-0.so!
200
--- 36.179590702056885 seconds ---
bgrimstad commented 1 year ago

At the top of the script you'll see a line that looks like this: splinter.load("/home/username/SPLINTER/build/debug/libsplinter-3-0.so")

Have you made sure to compile to release mode? With my build setup, I have to change the path to: splinter.load("/home/username/SPLINTER/build/release/libsplinter-3-0.so")

Last thing I can think of is to also test building from the develop-branch, like I did. But as I said earlier, I do not believe that to be the issue.

hlucian commented 1 year ago

Yes, I changed this line since i installed in splinter.load("/usr/local/lib/libsplinter-3-0.so") as you can also see from the output above. cmake .. inside the build folder returns the following output:

-- Setting compiler options ' -m64 -std=c++11 -Werror=return-type'
Configuring SPLINTER version 3-0 in RELEASE mode for x86-64 (64 bit)
Compiler flags:  -m64 -std=c++11 -Werror=return-type -O3 -DNDEBUG

This should be correct, if I am not wrong?

bgrimstad commented 1 year ago

It looks correct to me... Do you mind testing the develop-branch? I see that the example has been modified since the version in the master branch (the degree of the spline is a bit different).

I have mentioned your issue to @gablank. He is quite busy at the moment and it may take a few days before he can look at it.

hlucian commented 1 year ago

Yes, i checked out the develop branch but when running the same python command i get the following error:

Exception: Missing version file in SPLINTER directory! Please notify the developers of SPLINTER of this error.
bgrimstad commented 1 year ago

You can try to manually add the file python/splinterpy/version with the content "4-0".

hlucian commented 1 year ago

Ok, just copied the version-file from splinter/version to splinter/python/splinterpy I rerun the 200x200 table from bspline_multivariate.py modified as shown above and now the output is:

It is possible that SPLINTER was not compiled for the operating system and architecture you are using. If that is the case, you can compile SPLINTER yourself and load it using 'splinterpy.load("/path/to/SPLINTER.so")'
200
--- 1.1126136779785156 seconds --

So, it looks as if this changes quite a bit

gablank commented 1 year ago

Hello,

Cool to hear you have an interesting use for our library!

Could you run file /usr/local/lib/libsplinter-3-0.so and paste the output here?

bgrimstad commented 1 year ago

That was quite the difference. Happy you finally got the numbers down.

I will have to take a look at the changes we have made in the develop branch to pin down the culprit. I cannot remember introducing such significant improvements to the interpolation performance.

Hopefully, @gablank and I will get around to preparing release 4.0 within next week. Please let us know if you wish to help in any way or if you have any suggestions for final improvements.

hlucian commented 1 year ago

I tried to recompile my c++ sources with the development library. However, although following the docs/cpp_interface.mp I get the error

error: ‘SPLINTER::DenseVector’ {aka ‘class Eigen::Matrix<double, -1, 1>’} has no member named ‘at’; did you mean ‘data’?
  524 |             returnVal = splinter().eval(x).at(0)

What would be the correct syntax here?

hlucian commented 1 year ago

Ok, i believe the correct syntax should be without the at as given below

returnVal = splinter().eval(x)(0)

For my CFD solver, the time to build the interpolator has decreased for 300x300 tables from 1150seconds to 5seconds, a factor of 230. Thank you again for all the assistance.

hlucian commented 1 year ago

Hi Bjarne

The library seams to work. However, I encounter problems when switching to degree's higher than 1. Can you tell me, that the algorithm does on the boarders of the table? Does it extrapolate or just uses the boarder value?

Kindly Lucian