docs for simple examples, and necessity of unboxing

ahbarnett commented 2 years ago

Dear Clem, Thanks for this package. I am very interested in calling Julia from C (or C++ if it must be). Forgive the newbie questions.

1) docs. I struggled to find a simple "hello world" example - the "showcase" in the readme.md is too complicated for me, advertizing various advanced features that I cannot understand. I only use a bit of C++ and certainly not fancy features of C++20. As a scientific programmer I, and many others, would benefit most from a series of examples of calling simple julia code (first a command, then function, then module...) from C++ (or even from C if possible). So my request is: could you kindly simplify the first examples? (in the readme, and in the manual).

2) unboxing. In the manual you describe that unboxing is essential. (maybe I misunderstood?). However, one needs to be able to simply pass pointers to existing arrays without forcing a copy each time, for obvious reasons (avoiding slow-down for relatively cheap functions performed on large arrays, and also for saving RAM in large problems). A year ago I set up some demos of C calling Julia, using pointers:

https://github.com/ahbarnett/multilingual-julia/tree/main/ccallj

These examples start simple (calling a julia base function, then a julia function, then a julia module). Eg see https://github.com/ahbarnett/multilingual-julia/blob/main/ccallj/cfuncmodarr.c which wraps multi-threaded functions in the simple julia module https://github.com/ahbarnett/multilingual-julia/blob/main/ccallj/ArrMod.jl

They are incomplete, just a couple of days work, are not as elegant as I'd hoped (but SGJohnson helped), and nothing like the scale of your big project. (I also had trouble compiling/linking, as you will see from comments.) However, they do show what we consider essential in scientific programming --- eg, passing large arrays by pointers, accessing multithreaded high-performance julia code from a low-level langauge --- so I would be curious if/how your project can showcase some similar very simple array-passing and multithreaded examples? Maybe my simple examples could influence some of the documented examples you start with? (ties back to part 1 above). [Sadly I have not had time to use my own examples in projects yet, but plan to.]

Thanks and best wishes, Alex

Clemapfel commented 2 years ago

Hi! Thank you for your kind words!

I struggled to find a simple "hello world" example

A hello world is available in docs/installation.md here.

I would recommend to reproduce examples as they are mentioned in the manual. It may be overwhelming because the document is so long, but I tried to make it so that, if someone just goes chapter by chapter, each at a time, the more complex topics are build upon easier-to-understand ones. However:

I, and many others, would benefit most from a series of examples of calling simple julia code (first a command, then function, then module...) from C++ (or even from C if possible). So my request is: could you kindly simplify the first examples? (in the readme, and in the manual).

I agree with this. My manual kinda assume users to have full knowledge of both julia and C++, which, especially for the latter, is a bit much to ask. I think I will create a few examples of basic things like calling julia-code and moving basic objects between states, as you suggested, such that new users can get something to work more quickly. I have created a new branch to implement these over the coming week, I will keep this issue open until that is finished and I will update you here. Thank you for your suggestion.

I would be curious if/how your project can showcase some (...) multithreaded examples

jluna currently can't guarantee thread-safety, this will be added in version 0.9 (we're currently in 0.8), so very soon, I expect within 2 - 3 weeks. Until then, you should be able to do anything below in a multithreaded environment, just be careful and keep any data separated and any access guaranteed to be atomic.

unboxing. In the manual you describe that unboxing is essential. (maybe I misunderstood?). However, one needs to be able to simply pass pointers to existing arrays without forcing a copy each time, for obvious reasons (avoiding slow-down for relatively cheap functions performed on large arrays, and also for saving RAM in large problems)

So this will be kind of a long-winded explanation but I think I can maybe shed some clarity on things I may not have explained this very fundamental thing well in the manual: You do not need to unbox an array to access it, and you do not need to box an array to write to it. The following hopefully teaches why:

Let's say we have the following large array allocated julia-side:

// allocate a c-array with 100k values julia-side
auto julia_side_array_proxy = State::safe_eval(R"(
    julia_side_array = [Float32(i) for i in 1:100000]
    return julia_side_array
)");

Note how, in the generator expression, we explicitly return Float32. This is important because we can now be sure that, internally, the memory julia_side_array will be a C-array of floats (aka. Float32).

I will now describe ways of how to access this data, without unboxing, without any reallocation, and in a no-overhead way, compared to the C-API.

(Note that any code mentioned henceforth is available in this commented, compilable gist)

In the previous statement, we have created a proxy that points to the julia-side julia_side_array. Before explaining what a proxy is, we should use a more specialized one by converting the above variable into a jluna::Array:

Array<Float32, 1> as_array = proxy;

Where we specified its value type as Float32 and its dimension (rank) as 1. This statement does not cause reallocation, the reason for this is that Array<T, N> is a proxy and proxy only handles pointers to julia-side memory, not the memory itself. The only data it holds is exactly one 64-bit pointer, the actual array stays julia-side and any modification to the proxy is instead done to the julia-side value pointed to, "by proxy", hence the name.

To access any value, we can use Array::at:

std::cout << (Float32) as_array.at(10) << std::endl;

This has no overhead compared to the C-API, because we are dealing with an array of floats, while accessing an element as a float. Therefore, no conversion is necessary.

To modify a value, we simply assign the iterator at returns:

Array<Float32, 1> as_array = proxy; 

as_array.at(10) = 1234;
State::safe_eval("println(julia_side_array[11])"); // julia indices are 1-based
as_array.at(10) = 11;

Which is again, 0-overhead. This is only true for jluna::Array, jluna::Proxy (the less specialized proxy), has about a 5 - 10% overhead compared to pure C.

If you want to handle raw C-pointers, jluna also allows for that. You can get a void* to the data any array is pointing to using Array::data:

Float32* as_c_array = as_array.data();
std::cout << as_c_array[10] << std::endl;

And you can get a raw C-pointer to any given element by casting the iterator Array::at returns, first to Any*, then to any type you want:

auto* iterator_ptr = (Any*) as_array.at(10);

// cast Any* to Float32*, then dereference it to get the float
Float32 value = *((Float32*) iterator_ptr);
std::cout << value << std::endl;

Where I used the C casting stynax (Float32*) value, even though in C++ we should be using reinterpret_cast<Float32*>(value) for style reasons only. I'm sticking to C syntax for this post as you seem to be more familiar with that.

While doing all of these modifications on the raw C data, as_array needs to stay in scope, otherwise the garbage collector might free the data while we are still handling it. as_array is a proxy and a proxy keeps the value it points to safe from the GC, as long as the proxy is in scope.

All these pointers can be handled exactly like the would be in C, you can freely malloc, swap data, etc., though if you want to swap the pointer a proxy holds, you will need to create a new proxy. This does not cause significant allocation because, again, all a proxy holds is a single pointer.

This all works fine, but only if the value type of the array is what I call C-compliant. A list of C-compliant types can be found in the performance section of the manual, but I will reprint the entire list here, for convenience:

// cpp-side name         // julia-side name
bool                     Bool
char                     Char
int8_t                   Int8
int16_t                  Int16
int32_t                  Int32
int64_t                  Int64
uint8_t                  UInt8
uint16_t                 UInt16
uint32_t                 UInt32
uint64_t                 UInt64
float                    Float32
double                   Float64
const char*              CString [1]
T*                       Ptr{T}  [2]

[1] where const char* is a null-terminated char[] (C-string)
[2] where T is a C-compliant type

For all of these types, and only these types, it's no problem to just access and modify the underlying C-data, because julia and C++ / C share the same memory layout.

For any other type, though, what was shown above is possible but not recommended. A julia-side dict, for example, has a completely different memory layout than a C++ map. Because of this, the only way to safely access the entire julia-side dict is to unbox it, which converts all its memory and does cause reallocation. The same is true for boxing which is often undesriable.

If that is not an option (and doing it through jluna::Array is not an option), you will have to either the C-library (not recommended) or use non-jluna julia-functions (recommended), for example you can call getindex on a vector like so:

auto getindex = Base["getindex"];
auto setindex = Base["setindex!"];

// accessing values
std::cout << (Float32) getindex(julia_side_array_proxy, 10 + 1) << std::endl;

// changing values
setindex(julia_side_array_proxy, 1234, 10 + 1);
State::safe_eval("println(julia_side_array[11])");
setindex(julia_side_array_proxy, 11, 10 + 1);

Where we now use 1-based indices because we are using julia functions directly.

This also causes no reallocation, all the memory stays julia-side at all time. The julia-side functions are applied to julia-side values. The same is true for setindex, iterate, etc.. This is the best way to modify julia-side non-C-compliant data, imo.

Note that for the above, there is about a 10% overhead compared to the C-API. There are faster, 0-overhead version of julia-side function proxies available, the syntax is just a lot less nice. I'll add them to the gist with all the code, otherwise you can find tips on how to minimize overhead in the performance section of the manual.

Hopefully this helped, feel free to pose any follow-up questions. The code for this (which can also serve as an example until I finish them) is available in this gist. I also added the no-overhead and C-API-only ways in there, though I think they would be unnecessarily confusing for this post for now.

C.

Clemapfel commented 2 years ago

solved with release 0.9.0, please consider giving the manual another try, it was rewritten from scratch.

Clemapfel / jluna

docs for simple examples, and necessity of unboxing #12