elixir-nx / xla

Pre-compiled XLA extension
Apache License 2.0
83 stars 21 forks source link

XLA Slow operation alarms: "The operation took ..." #76

Closed Szetty closed 1 month ago

Szetty commented 4 months ago

Hello, we are having logs like this in our Elixir project:

The operation took 2m32.409760523s

********************************
[Compiling module _Function_4.57852891_4_in_Axon.Loop.build_batch_fn_2_.32279] Very slow compile? If you want to file a bug, run with envvar XLA_FLAGS=--xla_dump_to=/tmp/foo and attach the results.
********************************

As they are logged as errors they appear in our production logging system. I have tracked down and found that they are coming from: XLA.

Do you know if there is a way to disable them ?

josevalim commented 4 months ago

There is no way to cancel it upstream, I believe. You would have to ask on the official repo, in this repo we only build the executable.

seanmor5 commented 4 months ago

This happens from really expensive compilations. What specifically are you doing? Maybe we can find a way to simplify the executable you're trying to compile

seanmor5 commented 4 months ago

An example would be if you're training an RNN, and unrolling the entire sequence then it will result in this error

Szetty commented 4 months ago

This happens from really expensive compilations. What specifically are you doing? Maybe we can find a way to simplify the executable you're trying to compile

We are rebuilding some models periodically, we expect them to be a bit slow as they share resources with many other operations, but it is not realtime in any way so it is good enough for us, the only problem is that we have our error logs spammed with these logs

jonatanklosko commented 4 months ago

Looking at the source, possibly TF_CPP_MIN_LOG_LEVEL=3 could work, though it's a long shot. If it does work, it will however disable all levels other than fatal (that is info, warning, error).

jonatanklosko commented 1 month ago

I don't think there's anything actionable on our side, so I'm going to close this.