Open novacrazy opened 4 years ago
After more research, it seems horizontal ops are a bit weird in general, and even Rust's simd_reduce_add_unordered
, which compiles to @llvm.experimental.vector.reduce.v2.fadd.f32.v4f32
, results in a simple add/shuffle/extract algorithm. I had been expecting hadd
instructions or something.
hadd
is slow on real hardware, it's only useful as a size optimization.
LLVM has the @llvm.experimental.vector.reduce
intrinsics that you've identified, which is part of an effort to improve support for horizontal/reduce operations on vectors. Taking a quick look at LLVM 10 source, I don't believe any optimization outputs those intrinsics yet, but you can use them yourself. LLVM exposes native instructions through intrinsics like @llvm.x86.sse3.hadd.ps
which are target-specific, if you need them.
I suggest writing the natural code with target-neutral LLVM including the experimental vector reduce intrinsics, and once you get the resulting assembly, see if you can beat it. Previously I'd recommend using Intel Architecture Code Analyzer to analyzer assembly performance, but their webpage is now redirecting to llvm-mca: https://llvm.org/docs/CommandGuide/llvm-mca.html .
Well, the original point of the issue is still present. How would I go about viewing the generated machine code from Inkwell?
In fact, I'm also not sure how to inject arbitrary LLVM IR other than to create an entirely new module out of it.
I'm not certain you can do the former at the moment. For the latter, maybe ~Module::parse_bitcode_from_buffer
~ Context::create_module_from_ir
? I don't think you can just inject IR into the module other than creating it from scratch
I'll have to experiment with that. Perhaps handwrite a few modules for common ops and rely on link_in_module
to combine it with generated code.
As for viewing the assembly, perhaps reinterpreting the raw function pointer in Nevermind, dumb idea.JitFunction
as a slice of something and searching for a ret
instructions could work to get a range, depending on what the underlying real code is provided by LLVM. I mean, it's still just raw bytes at that point, but it's a start.
Oh. Of course, this is already available.
Target::initialize_native(&InitializationConfig::default()).expect("Failed to initialize native target");
let triple = TargetMachine::get_default_triple();
let cpu = TargetMachine::get_host_cpu_name().to_string();
let features = TargetMachine::get_host_cpu_features().to_string();
let target = Target::from_triple(&triple).unwrap();
let machine = target
.create_target_machine(
&triple,
&cpu,
&features,
OptimizationLevel::Aggressive,
RelocMode::Default,
CodeModel::Default,
)
.unwrap();
// create a module and do JIT stuff
machine.write_to_file(&module, FileType::Assembly, "out.asm".as_ref()).unwrap();
So yeah, took me a while to find out how, but it does indeed save the whole assembly with labels, attributes and so forth.
It also confirms that it's producing highly-optimized machine code just like I hoped.
However, some better documentation around target machines would be very helpful. Is it stateful? Does it actually affect codegen? Other than exporting that module, it doesn't touch the JIT code, so its affect is unknown.
You're welcome to close this if this solution is acceptable, though my questions still stand.
I'm about to start using Inkwell for a highly-optimized JIT system, and it would be great if there were a way to view the resulting compiled code or even just getting a pointer and length to where the code is, allowing me to read it directly.
I'm aware of the
print_to_string
/print_to_stderr
methods onFunctionValue
, but those only seem to print the raw LLVM IR.Without access to horizontal vector ops, I'm hoping LLVM will be able to autovectorize vector sums and products well enough, but without a way to see the resulting instructions I can't know.
Please let me know if I'm missing something obvious! Also if you have any ideas for autovectorization or horizontal vector ops, I'd love to hear them.
Here is the kind of thing I plan to do: