Matrix update fast matrix | EmuSIMD Emulation

FastMatrix reimplemented, using EmuSIMD library and fully generic
- Has some issues which can be addressed when needed, such as taking advantage of representing a 4x4 Matrix of 32-bit elements as a single 512-bit register, but that is for a later point.
Huge overhaul to EmuSIMD
- Lots of functions are now forward-declared
- If a register/instruction width is not available on target hardware, EmuSIMD will now emulate that via 2 registers of the below width
  - For example, on test hardware we have access to everything except AVX-512, so instead when using f32x16 (for example) we use two f32x8 registers and call their f32x8 operations twice, but from a programmer's perspective it is no different to using real AVX-512 registers (except for results, of course*)
    - *This can actually help slightly with cache locality and data dependency issues and improve performance; see the comparison tests of 128-bit, 256-bit, and "512-bit" registers in the test harness, with "512-bit" twice as fast as 256-bit on the same set of data
  - Some areas need more work, such as adding emulation branches for all registers <512-bit, and emulated conversions which are extremely inefficient currently
- Changes some functions (although not all) to take advantage of concepts to allow different argument types to be safely passed to template SIMD functions, so long as they are ultimately the same register (e.g. combining some_register and const some_register& as arguments to add(x, y))
  - This ultimately also better supports the concept of passing a specific reference qualification as an argument to functions, since under some compilation conditions we want to pass registers by value whereas in others we may want to pass by-reference. This approach maintains arguments as they are passed but ensures nothing is done with them until they are passed as the required argument type for that register when it reaches the determined function
- These changes were tested with EmuMath's FastNoiseTable's processing functors, so they have also been modified (and as a result, 256-bit support is now available; 512-bit has been postponed due to the generation being significantly slower (part of this is likely due to suboptimal conversion emulations)
  - 512-bit inefficiencies may also come from not all template functions implementing the new concepts for input arguments, meaning that we may be doing a lot of unwanted copies where we should be passing references (as emulated registers use pass-by-reference semantics, but many templates are still pass-by-value).
Leaves some notes for what remains to be implemented for Matrix
Adds a basic arg parser to EmuCore, which can be used to easily gather command line arguments into an easily accessed interface
- Includes plenty of helper features, such as default arguments and parsing the strings into various types (using both built-in conversions and custom functions)
Removes EmuThreads as it was very much unwanted in its current state; an update for it will be delivered at a later date

BigUglySpider / EmuLibs

Matrix update fast matrix | EmuSIMD Emulation #73