Feature-free implementation of vectors and vector spaces

AdamNorberg commented 3 years ago

Beginning of implementation for issue #45. Introduces the IVectorSpace concept, a few sample vector spaces, and the barest beginnings of the Vector<IVectorSpace> type itself - just constructors, accessors, and equality comparison. Actual vector math will be in later pull requests.

AdamNorberg commented 3 years ago

TL;DR: C# generics do not offer substantial code generation, unlike C++ templates, so concrete fields have to hang around just to hold type information that does not otherwise exist at runtime.

AdamNorberg commented 3 years ago

Also, all this is stuff I wanted to do, but can't do in pure C#. Using C++/CLR (AKA, writing C++ for .NET) barely helps; we could use templates for all of this, but we'd have to handwrite a class (or at least write a macro to write a class...) for every specialization that needs to be visible in C# because C# generics cannot pass parameterized type information down to C++ templates at compile time, so no template specialization can be made for a parameterized type.

That might still be the least bad option, for what that's worth, since we could still write an optimized C++ class designed for stack allocation, compose it into its .NET-ready wrapper types (observed by C# as structs), and get the behavior we predict is likely to be highest-performance while minimizing code repetition (we have to write a bunch of wrappers but that's way less work than rewriting all the bodies too - enough less work that we can realistically write context-specific wrappers).

Lathreas commented 3 years ago

Hmmm, yeah, thanks for the in-depth response! I definitely agree, and the limit of C#'s generics was something I have been afraid of for quite a while now, to be fair. I recognize the struggles you've described, and although indeed this might very well be the highest-performance implementation we can get to in C#, I do fear it might have long-term consequences to the performance of the entire application if we keep holding on to it for too long. The resource requirements are only really expected to increase, and indeed rewriting this Vector type using C++/CLR is not really an attractive option due to repetitive glue code and the inevitable marshaling overhead that we will get.

Also, all this is stuff I wanted to do, but can't do in pure C#.

Oh, yeah, I feel your pain :P (see: the horrible ContinuousMap<TIn, TOut> which I had hoped to make fully generic but instead had to implement specific versions of, kind of removing the purpose of the generics :P).

I'm fine using a less-optimized Vector struct implementation for the sake of debugging and development, but I do believe that for any production work we need to squeeze out all of the performance we can get; especially on the lower level. Although performance is very important, I'm also particularly afraid of memory overhead, which will become severely limiting if we want to fully represent anatomy at the sub-millimeter level. I have already made a few Blender surfaces using careful sculpting and memory optimization techniques at millimeter-level precision, and despite using no-overhead buffers and 32bit precision, the arrays themselves easily surpass 6 GB of memory (for merely the face and hair follicles) despite being topologically optimized (i.e. using the 'decimate' modifier to remove unnecessary detail). Due to the nature of the program, we will need to be using topologically unoptimized surfaces, since the program cannot possibly predict the level of detail a particular spot can have. Of course, quadtrees come into mind for optimizing that structure, but that'll still be nowhere near as memory efficient as a decimate modifier. These memory costs should be taken into account, so for any large array storage we should be careful not to store unnecessary overhead. For small stuff, such as representing the broad shapes and curves, the memory overhead will probably not impact too much.

For performance, I haven't done any 'real' testing in the sense that I used proper measurement tools, but I did do a few simple timing checks. It appears a lot of the delay is in the calculation of the intersection points, although I haven't yet determined whether this is specifically due to the arithmetic or due to e.g. method call overhead (I expect the former though). Indeed we should opt for better performance measurements than that, but the order of magnitude should remain the same either way.

I'll definitely do a few general arithmetic optimization passes in the near future, and that would indeed be generally applicable regardless of the language. I think there are indeed huge performance gains to be gotten from simply skipping code that isn't necessary, or by trying to simplify some of the mathematics. Doing parallelization will require us to rethink how the code is executed; using iterators will make the process inherently serial whereas the code can easily be parallelized using a preallocated array.

That said, I'm not too comfortable doing a lot of language-dependent optimization work if we aren't sure we will stick to the language at hand unless it's a very obvious and easy change (such as changing a type or adding a compiler hint), at risk of doing too much unnecessary work (remember that we only really spend a few hours each week on this currently and development time is quite valuable).

So all in all, if indeed C# is simply too limiting for the work we intend to do, I do think we should plan ahead a time to move to a lower-level language for the engine code if we are to make the library production-ready, either now or after a first draft is ready. I think this might have multiple benefits, also to make it more of a general-use library beyond just C# (since C++ is easier to integrate into many languages from my experience). Furthermore, from my experience with .NET and massively parallel processing, I believe I will also be able to get more GPU-performance using carefully written libraries if we use C++/OpenCL or CUDA for the important bits instead of using C#'s methods, since I'll have more control over the CPU-GPU communication overhead. So far, I've found hand-written communication code to have superior performance over auto-generated GPU parallelization because the compiler cannot predict whether arrays will be used on the GPU or on the CPU, so it copies the final result back into memory each time it is accessed (either in advance or just-in-time, so it's not all wasted, but for example between class calls it will often copy the result back onto memory).

AdamNorberg commented 3 years ago

Not practical. Using GLM#.

Freedom-of-Form-Foundation / anatomy3d

Feature-free implementation of vectors and vector spaces #48