Closed ZuseZ4 closed 5 months ago
You can export the type trees directly onto arguments as attributes so there is no additional data added from the api call not otherwise in the module
Agree, that's
I was wondering about adding a flag that dumps all the TT for the outermost function into the IR (like we do it with mem* functions already).
My question was, are there other reasons due to which the ABI could differ from the opt pass?
I mean I wouldn’t even bother with a flag just pass it as a fn attribute always and don’t bother with the api version and you can be sure that’s not it
I think the other big question is ensuring both the same optimization and analysis state is available.
llvm registers some analyses as available so that could be one cause if opt doesn’t register the same before running passes
Didn't you say that the there was a huge compile time penalty of this (IIRC on the julia side), or was that in another case? Yes, if compile times are the same I'd change it unconditionally.
There was a huge compile time benefit
Rip. Yes then I'll try that and come back if we still get different results. thx.
Specifically it alongside the no ta recur flag enable a mostly losses version of the info of TA except now instead of doing a very expensive interprocedural fixed point across all values in all functions, it’s now factored to only needed to reach a fixed point in a single function (for the most part).
It’s not the be all end all to compile times, but means big programs should no longer have catastrophic TA analysis runtime by virtue of purely being big. We still need to port into c++ tho
Naive question. But if passing TT in the IR is so much faster, why isn't Enzyme just internally adding all TT that it receives through API calls into fn attributes in the LLVM-IR?
The perf speed up isn’t the fact that attributes are faster than api (it’s actually slightly slower rn since we parse the string to an int). The benefit is that if you ahead of time attribute everything 1) you only need to recurse through language types once (which that parse Julia type into enzyme TT was expensive so that’s now saved across all versions) 2) custom api calls in type analyses are much more expensive than the analysis just running without a plugin 3) and this is the most critical one: by putting the full info at each fn, you can get almost all the same info you need without the need to run interprocedural so if you imagine each function level TA takes time n without considering a sub call. If getting new info means a sub call would have more info, it would propagate it to its sub calls etc. so worst case scenario you could have a ping pong between some very nested call data structures causing that n to be multiplied by a stupid asymptotic (n! Maybe, would need to think)
On Wed, Jul 3, 2024 at 5:52 PM Manuel Drehwald @.***> wrote:
Naive question. But if passing TA in the IR is so much faster, why isn't Enzyme just internally adding all TT that it receives through API calls into the LLVM-IR?
— Reply to this email directly, view it on GitHub https://github.com/EnzymeAD/rust/issues/130#issuecomment-2207374100, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTUXGVVGKVCMQMSFIYYLLZKRXCHAVCNFSM6AAAAABKKK6OZWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBXGM3TIMJQGA . You are receiving this because you were assigned.Message ID: @.***>
To be clear there’s still bad perf possible failure modes here, but now they’re limited to at most the size of a function instead of the entire program (hence the claim that large applications don’t suffer as a result).
On Wed, Jul 3, 2024 at 5:58 PM Billy Moses @.***> wrote:
The perf speed up isn’t the fact that attributes are faster than api (it’s actually slightly slower rn since we parse the string to an int). The benefit is that if you ahead of time attribute everything 1) you only need to recurse through language types once (which that parse Julia type into enzyme TT was expensive so that’s now saved across all versions) 2) custom api calls in type analyses are much more expensive than the analysis just running without a plugin 3) and this is the most critical one: by putting the full info at each fn, you can get almost all the same info you need without the need to run interprocedural so if you imagine each function level TA takes time n without considering a sub call. If getting new info means a sub call would have more info, it would propagate it to its sub calls etc. so worst case scenario you could have a ping pong between some very nested call data structures causing that n to be multiplied by a stupid asymptotic (n! Maybe, would need to think)
On Wed, Jul 3, 2024 at 5:52 PM Manuel Drehwald @.***> wrote:
Naive question. But if passing TA in the IR is so much faster, why isn't Enzyme just internally adding all TT that it receives through API calls into the LLVM-IR?
— Reply to this email directly, view it on GitHub https://github.com/EnzymeAD/rust/issues/130#issuecomment-2207374100, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTUXGVVGKVCMQMSFIYYLLZKRXCHAVCNFSM6AAAAABKKK6OZWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBXGM3TIMJQGA . You are receiving this because you were assigned.Message ID: @.***>
Also I will note effective usage of this requires the language to provide full type trees for all args and returns for all functions (useful even with this off, but if you had interprocedural on it would spend time to possibly derive this even if not specified).
We do this for Julia and I just haven’t gotten to doing that for c++ yet
On Wed, Jul 3, 2024 at 6:01 PM Billy Moses @.***> wrote:
To be clear there’s still bad perf possible failure modes here, but now they’re limited to at most the size of a function instead of the entire program (hence the claim that large applications don’t suffer as a result).
On Wed, Jul 3, 2024 at 5:58 PM Billy Moses @.***> wrote:
The perf speed up isn’t the fact that attributes are faster than api (it’s actually slightly slower rn since we parse the string to an int). The benefit is that if you ahead of time attribute everything 1) you only need to recurse through language types once (which that parse Julia type into enzyme TT was expensive so that’s now saved across all versions) 2) custom api calls in type analyses are much more expensive than the analysis just running without a plugin 3) and this is the most critical one: by putting the full info at each fn, you can get almost all the same info you need without the need to run interprocedural so if you imagine each function level TA takes time n without considering a sub call. If getting new info means a sub call would have more info, it would propagate it to its sub calls etc. so worst case scenario you could have a ping pong between some very nested call data structures causing that n to be multiplied by a stupid asymptotic (n! Maybe, would need to think)
On Wed, Jul 3, 2024 at 5:52 PM Manuel Drehwald @.***> wrote:
Naive question. But if passing TA in the IR is so much faster, why isn't Enzyme just internally adding all TT that it receives through API calls into the LLVM-IR?
— Reply to this email directly, view it on GitHub https://github.com/EnzymeAD/rust/issues/130#issuecomment-2207374100, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTUXGVVGKVCMQMSFIYYLLZKRXCHAVCNFSM6AAAAABKKK6OZWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBXGM3TIMJQGA . You are receiving this because you were assigned.Message ID: @.***>
@wsmoses Thanks for the extensive list. I should now only have tt in the IR (also making it much easier to test).
; Function Attrs: noinline nonlazybind sanitize_hwaddress uwtable
9698 define dso_local "enzyme_type"="{}" void @ffff(ptr align 8 "enzyme_type"="{[-1]:Pointer, [-1,0]:Integer, [-1,1]:Integer, [-1,2]:Integer, [-1,3]:Integer, [-1,4]:Integer, [-1,5]:Integer, [-1,6]:Integer, [-1,7]:Integer, [-1,8]:Pointer, [-1,8,-1]:Float@double, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integ er, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer}" %0, ptr align 8 "enzyme_type"="{[-1]:Pointer, [-1,-1]:Float@double}" %1) unnamed_addr #15 personality ptr @rust_e h_personality !dbg !1312 {
The only issue is that now Enzyme seems to throw an exception:
fatal runtime error: Rust cannot catch foreign exceptions
warning: `ad` (bin "ad") generated 1 warning
error: could not compile `ad` (bin "ad"); 1 warning emitted
Presumably bc. I pass the wrong things to the API. I tried passing both nullpointers here, as well as empty tt:
2 let mut args_tree = vec![TypeTree::new().inner; input_tts.len()];
1 let ret_tt = TypeTree::new().inner;
899 let dummy_type = CFnTypeInfo {
1 Arguments: args_tree.as_mut_ptr(),
2 Return: ret_tt.clone(),
3 KnownValues: known_values.as_mut_ptr(),
4 };
What is the correct value here now @wsmoses ?
We have an issue where we aren't able to reproduce a bug that shows up in rustc+enzyme through opt. I have a flag ENZYME_PRINT_MOD_BEFORE which gives the whole module llvm-ir (fat lto etc.) just before calling Enzyme, so we were expecting that writing a dummy function that calls __enzyme_autodiff and using the opt plugin (on the same llvm) would give the same error. For which reasons could this fail? I was wondering about adding a flag that dumps all the TT for the outermost function into the IR (like we do it with mem* functions already). Those are otherwise directly passed to Enzyme and thus not part of the IR. Could there be another reason why we fail to reproduce bugs?