GeoStat-Framework / GSTools

GSTools - A geostatistical toolbox: random fields, variogram estimation, covariance models, kriging and much more
https://geostat-framework.org
GNU Lesser General Public License v3.0
544 stars 71 forks source link

Drop deprecated conditional compilation #354

Closed LSchueler closed 1 month ago

LSchueler commented 2 months ago

As mentioned in #343, our very clean and elegant implementation of the conditional import of OpenMP is deprecated and this is my attend to find a new solution.

LSchueler commented 2 months ago

@MuellerSeb What is this?! Have you ever encountered such an error? - The problems seems to be the bint type, not the const keyword, when passing bools to Cython. I haven't found anything about this error online.

ValueError: Buffer dtype mismatch, expected 'const bool' but got 'bool'

LSchueler commented 2 months ago

This is ready for review. The problem with the deprecated Cython functionality is solved. Furthermore, I have added a second global var. to config.py to be able to differentiate between "GSTools-Core available" and "GSTools-Core selected". With that, and by selecting the corresponding import, the change between the Cython and the Rust implementations now work as expected. I have also improved the readme regarding parallelization and installation a bit.

MuellerSeb commented 2 months ago

Cool that this is working now.

What I don't like is, that we now have a lot of logic and pylint stuff everywhere we call a compiled algorithm..

What do you think about putting all cython code in a lib folder and provide a convenience layer there to provide the right implementation based on USE_RUST? We could have some lazy loading logic there as well.

Here is an idea:

import importlib.util

# Step 1: Check if the Rust backend is available (this could be in gstools.config)
RUST_BACKEND_AVAILABLE = importlib.util.find_spec("rust_backend") is not None

# Step 2: Global variable for switching implementations
USE_RUST = RUST_BACKEND_AVAILABLE

# Step 3: Variables to store the implementations
implementations = {
    "rust": {},
    "cython": {}
}

def load_algorithm(func_name):
    if USE_RUST:
        if RUST_BACKEND_AVAILABLE:
            if func_name not in implementations["rust"]:
                import rust_backend
                implementations["rust"][func_name] = getattr(rust_backend, func_name)
        else:
            raise ImportError("Rust backend is not available.")
    else:
        if func_name not in implementations["cython"]:
            import cython_backend
            implementations["cython"][func_name] = getattr(cython_backend, func_name)

def get_algorithm_function(func_name):
    load_algorithm(func_name)
    if USE_RUST and RUST_BACKEND_AVAILABLE and func_name in implementations["rust"]:
        return implementations["rust"][func_name]
    else:
        return implementations["cython"][func_name]

# Step 4: Factory function to create wrapper functions
def create_wrapper_function(func_name):
    def wrapper(*args, **kwargs):
        func = get_algorithm_function(func_name)
        return func(*args, **kwargs)
    return wrapper

# Step 5: Define the list of function names and create wrappers dynamically
# this could also be explicitly done
function_names = ["algorithm", "other_function"]
for name in function_names:
    globals()[name] = create_wrapper_function(name)

# Usage example
USE_RUST = True  # Use Rust implementation if available
result = algorithm(some_arg)
other_result = other_function(another_arg)

USE_RUST = False  # Switch to Cython implementation
result = algorithm(some_arg)
other_result = other_function(another_arg)

USE_RUST = True  # Switch back to Rust implementation
result = algorithm(some_arg)
other_result = other_function(another_arg)

Then we can use the algorithms by simply calling them and the backend is selected automatically. Also the algorithm is only loaded if called and switching between rust and cython is easy. We could then also have algorithms that are not implemented in the rust backend and raise an error only for this routine.

Only downside is maybe that the error for the missing rust backend is only thrown with the first function call after USE_RUST was set to true.

LSchueler commented 2 months ago

I completely agree that the code looks rather messy right now. But if we really go down that road, I see one problem and one thing to discuss.

Originally, we decided to put the Cython files into the folders of their respective submodules because logically, they belong there. With your approach, we would have the 10 or so "algorithm imports" globally in GSTools. That's not a very clean solution, albeit, some of the if's and pylint things we have in this PR right now would vanish.

The other thing to discuss is that if we actually put all the Cython files into a separate folder, why not create a separate Python package with only the Cython stuff in it, exactly like the GSTools-Core package with the Rust bindings? That would actually make a lot of things much easier for the GSTools package, as the Cython code only changes very seldomly and we could skip all the compilation steps and the rather complicated setup.

LSchueler commented 2 months ago

What do you think of this, maybe temporary, compromise: At the beginning of each Python file which depends on Cython or Rust files, we could define not so abstract functions, which bundle the messy stuff, like all the if's and linter instructions. Here is an example for variogram.py:

def directional(
    field,
    bin_edges,
    pos,
    direction,
    angles_tol=np.pi/8.0,
    bandwidth=-1.0,
    separate_dirs=False,
    estimator_type="m",
    num_threads=None,
):
    if (
        config.USE_GSTOOLS_CORE
        and config._GSTOOLS_CORE_AVAIL  # pylint: disable=W0212
    ):  # pylint: disable=W0212
        directional = directional_gsc  # pylint: disable=E0606
    else:
        directional = directional_c
    return directional(
        field,
        bin_edges,
        pos,
        direction,
        angles_tol,
        bandwidth,
        separate_dirs,
        estimator_type,
        num_threads,
    )

Then, further down in the code, you would only call this function: directional(...).

That way, we still have the code structured and encapsulated according to GSTool's submodules and the interesting parts of the code are not cluttered with distracting stuff.

MuellerSeb commented 2 months ago

The other thing to discuss is that if we actually put all the Cython files into a separate folder, why not create a separate Python package with only the Cython stuff in it, exactly like the GSTools-Core package with the Rust bindings? That would actually make a lot of things much easier for the GSTools package, as the Cython code only changes very seldomly and we could skip all the compilation steps and the rather complicated setup.

I love the idea of a GSTools_cython package. Will try this by forking GSTools and removing everything but the cython code to keep a history.

Your other proposal seems reasonable for the time being.

LSchueler commented 2 months ago

awesome!

I just moved all the if-logic into separate functions on the sub-module level. A few things to discuss:

MuellerSeb commented 2 months ago
* How detailed should the docstrings of the wrappers be?

Wouldn't add doc strings in the wrapper

* Should the functions start with an underscore? - And then remove the notes from the docstrings.

Yes, keep it private (and no need for doc strings)

* What should the default value of the optional argument `num_threads` be? In Cython and Rust, it's `None`. I proposed `num_threads=config.NUM_THREADS` in the commit.

Would use None in the wrapper like in the cython/rust implementations. Later num_threads=config.NUM_THREADS is passed either way.

For new cython backend see here: https://github.com/GeoStat-Framework/GSTools-Cython/pull/1 :tada:

LSchueler commented 2 months ago

Okay, I updated the wrappers. But just for code documentation, I included super-short docstrings. I could imagine that those functions would be rather confusing, if they have no description at all.

If you are fine with that decision, LGTM?

LSchueler commented 1 month ago

I think this is finally ready?!