Closed LSchueler closed 1 month ago
@MuellerSeb What is this?! Have you ever encountered such an error? - The problems seems to be the bint
type, not the const
keyword, when passing bools to Cython. I haven't found anything about this error online.
ValueError: Buffer dtype mismatch, expected 'const bool' but got 'bool'
This is ready for review. The problem with the deprecated Cython functionality is solved. Furthermore, I have added a second global var. to config.py
to be able to differentiate between "GSTools-Core available" and "GSTools-Core selected". With that, and by selecting the corresponding import, the change between the Cython and the Rust implementations now work as expected. I have also improved the readme regarding parallelization and installation a bit.
Cool that this is working now.
What I don't like is, that we now have a lot of logic and pylint stuff everywhere we call a compiled algorithm..
What do you think about putting all cython code in a lib
folder and provide a convenience layer there to provide the right implementation based on USE_RUST
? We could have some lazy loading logic there as well.
Here is an idea:
import importlib.util
# Step 1: Check if the Rust backend is available (this could be in gstools.config)
RUST_BACKEND_AVAILABLE = importlib.util.find_spec("rust_backend") is not None
# Step 2: Global variable for switching implementations
USE_RUST = RUST_BACKEND_AVAILABLE
# Step 3: Variables to store the implementations
implementations = {
"rust": {},
"cython": {}
}
def load_algorithm(func_name):
if USE_RUST:
if RUST_BACKEND_AVAILABLE:
if func_name not in implementations["rust"]:
import rust_backend
implementations["rust"][func_name] = getattr(rust_backend, func_name)
else:
raise ImportError("Rust backend is not available.")
else:
if func_name not in implementations["cython"]:
import cython_backend
implementations["cython"][func_name] = getattr(cython_backend, func_name)
def get_algorithm_function(func_name):
load_algorithm(func_name)
if USE_RUST and RUST_BACKEND_AVAILABLE and func_name in implementations["rust"]:
return implementations["rust"][func_name]
else:
return implementations["cython"][func_name]
# Step 4: Factory function to create wrapper functions
def create_wrapper_function(func_name):
def wrapper(*args, **kwargs):
func = get_algorithm_function(func_name)
return func(*args, **kwargs)
return wrapper
# Step 5: Define the list of function names and create wrappers dynamically
# this could also be explicitly done
function_names = ["algorithm", "other_function"]
for name in function_names:
globals()[name] = create_wrapper_function(name)
# Usage example
USE_RUST = True # Use Rust implementation if available
result = algorithm(some_arg)
other_result = other_function(another_arg)
USE_RUST = False # Switch to Cython implementation
result = algorithm(some_arg)
other_result = other_function(another_arg)
USE_RUST = True # Switch back to Rust implementation
result = algorithm(some_arg)
other_result = other_function(another_arg)
Then we can use the algorithms by simply calling them and the backend is selected automatically. Also the algorithm is only loaded if called and switching between rust and cython is easy. We could then also have algorithms that are not implemented in the rust backend and raise an error only for this routine.
Only downside is maybe that the error for the missing rust backend is only thrown with the first function call after USE_RUST was set to true.
I completely agree that the code looks rather messy right now. But if we really go down that road, I see one problem and one thing to discuss.
Originally, we decided to put the Cython files into the folders of their respective submodules because logically, they belong there. With your approach, we would have the 10 or so "algorithm imports" globally in GSTools. That's not a very clean solution, albeit, some of the if's and pylint things we have in this PR right now would vanish.
The other thing to discuss is that if we actually put all the Cython files into a separate folder, why not create a separate Python package with only the Cython stuff in it, exactly like the GSTools-Core package with the Rust bindings? That would actually make a lot of things much easier for the GSTools package, as the Cython code only changes very seldomly and we could skip all the compilation steps and the rather complicated setup.
What do you think of this, maybe temporary, compromise: At the beginning of each Python file which depends on Cython or Rust files, we could define not so abstract functions, which bundle the messy stuff, like all the if's and linter instructions.
Here is an example for variogram.py
:
def directional(
field,
bin_edges,
pos,
direction,
angles_tol=np.pi/8.0,
bandwidth=-1.0,
separate_dirs=False,
estimator_type="m",
num_threads=None,
):
if (
config.USE_GSTOOLS_CORE
and config._GSTOOLS_CORE_AVAIL # pylint: disable=W0212
): # pylint: disable=W0212
directional = directional_gsc # pylint: disable=E0606
else:
directional = directional_c
return directional(
field,
bin_edges,
pos,
direction,
angles_tol,
bandwidth,
separate_dirs,
estimator_type,
num_threads,
)
Then, further down in the code, you would only call this function: directional(...)
.
That way, we still have the code structured and encapsulated according to GSTool's submodules and the interesting parts of the code are not cluttered with distracting stuff.
The other thing to discuss is that if we actually put all the Cython files into a separate folder, why not create a separate Python package with only the Cython stuff in it, exactly like the GSTools-Core package with the Rust bindings? That would actually make a lot of things much easier for the GSTools package, as the Cython code only changes very seldomly and we could skip all the compilation steps and the rather complicated setup.
I love the idea of a GSTools_cython
package. Will try this by forking GSTools and removing everything but the cython code to keep a history.
Your other proposal seems reasonable for the time being.
awesome!
I just moved all the if-logic into separate functions on the sub-module level. A few things to discuss:
num_threads
be? In Cython and Rust, it's None
. I proposed num_threads=config.NUM_THREADS
in the commit.* How detailed should the docstrings of the wrappers be?
Wouldn't add doc strings in the wrapper
* Should the functions start with an underscore? - And then remove the notes from the docstrings.
Yes, keep it private (and no need for doc strings)
* What should the default value of the optional argument `num_threads` be? In Cython and Rust, it's `None`. I proposed `num_threads=config.NUM_THREADS` in the commit.
Would use None in the wrapper like in the cython/rust implementations. Later num_threads=config.NUM_THREADS
is passed either way.
For new cython backend see here: https://github.com/GeoStat-Framework/GSTools-Cython/pull/1 :tada:
Okay, I updated the wrappers. But just for code documentation, I included super-short docstrings. I could imagine that those functions would be rather confusing, if they have no description at all.
If you are fine with that decision, LGTM?
I think this is finally ready?!
As mentioned in #343, our very clean and elegant implementation of the conditional import of OpenMP is deprecated and this is my attend to find a new solution.