elixir-nx / nx

Multi-dimensional arrays (tensors) and numerical definitions for Elixir
2.66k stars 193 forks source link

XLA_FLAGS for dumping not returning HLO #1500

Closed christianjgreen closed 2 months ago

christianjgreen commented 5 months ago

I am trying to analyze the optimized HLO outputs for a few Nx functions using this flag in a live book: system_env: %{"XLA_FLAGS" => "--xla_dump_to=/tmp/hlo"}, however no dumps are generated.

If there's another preferred way to get the optimized dumps for a function, I'd love to know how the core devs originally checked the output graphs.

Thanks for any help/tips!

josevalim commented 5 months ago

system_env in deps are only used during compilation. You want to set this flag when you call mix or your terminal directly. :)

christianjgreen commented 5 months ago

This is the method i'm currently using, does it need to be set beforehand?

Mix.install(
  [
    {:exla path: "~/projects/nx/exla"}
    {:nx, path: "~/projects/nx/nx"}
  ],
  # system_env: %{"XLA_TARGET" => "cuda12"}, <-- this works
  system_env: %{"XLA_FLAGS" => "--xla_dump_to=/tmp/hlo"}, # <- this does not work
)
josevalim commented 5 months ago

Oh, on Mix.install, that should set the env var correctly. If it doesn't work, then maybe the flag itself is no longer relevant? Or maybe it also needs --xla_dump_hlo_as_dot? Are you sure the EXLA compiler is used? Are you using EXLA.jit to run the code?

christianjgreen commented 5 months ago

Positive the compiler is being used! Let me give you the latest snippet instead of a bad copy paste :p

Mix.install(
  [
    {:exla, path: "~/projects/nx/exla"},
    {:nx, path: "~/projects/nx/nx"}
  ],
  # system_env: %{"XLA_TARGET" => "cuda12"}, <-- this works
  system_env: %{"XLA_FLAGS" => "--xla_dump_to=/tmp/hlo --xla_dump_hlo_as_dot"}, # <- this does not work
  config: [nx: [default_backend: EXLA.Backend]]

)

And these are the three different calls I've tried to get HLO from

{matrix, _} = Nx.Random.uniform(Nx.Random.key(42223), shape: {20, 20}, type: :f32)
matrix = Nx.add(matrix, Nx.transpose(matrix)) |> Nx.divide(2)

Nx.LinAlg.eigh(matrix)
Nx.Shared.optional(
  :j_eigh4,
  [matrix],
  {Nx.take_diagonal(matrix), matrix},
  &Nx.LinAlg.eigh/1
)

EXLA.jit(&Nx.LinAlg.JacobiEigh.eigh/1)
josevalim commented 5 months ago

Just to make sure, are you calling the function returned by EXLA.jit(&Nx.LinAlg.JacobiEigh.eigh/1)?

christianjgreen commented 5 months ago

Just to make sure, are you calling the function returned by EXLA.jit(&Nx.LinAlg.JacobiEigh.eigh/1)?

Yes sir!

Here is the call and the returned tuple

e = EXLA.jit(&Nx.LinAlg.JacobiEigh.eigh/1)
e.(matrix)

{#Nx.Tensor<
   f32[20]
   EXLA.Backend<host:0, 0.3304141925.3251240981.37154>
   [-1.6381525993347168, -1.3552285432815552, -1.1910206079483032, -1.0310735702514648, -0.912943959236145, -0.8215287923812866, -0.6370212435722351, -0.3248468041419983, -0.23171192407608032, -0.08701343089342117, 0.19126664102077484, 0.3433498442173004, 0.3924603760242462, 0.5175570249557495, 0.7870075106620789, 0.8510072827339172, 1.1144767999649048, 1.2940683364868164, 1.6437021493911743, 9.338674545288086]
 >,
 #Nx.Tensor<
   f32[20][20]
   EXLA.Backend<host:0, 0.3304141925.3251240981.37155>
   [
     [-0.07324251532554626, -0.3707551062107086, -0.3129488527774811, -0.04760716110467911, -0.14367316663265228, -0.05768333375453949, 0.26982104778289795, -0.08689552545547485, 0.01865920051932335, 0.05672043189406395, -0.11009827256202698, 0.23080967366695404, 0.010919198393821716, 0.00763005530461669, -0.04259205609560013, 0.6290202736854553, 0.2388024479150772, 0.19880637526512146, -0.1793692409992218, 0.23938940465450287],
     [0.10503554344177246, 0.48980197310447693, -0.07325314730405807, 0.36990198493003845, -0.10179188847541809, 0.11219511926174164, -0.13094516098499298, 0.05496946722269058, 0.4019777774810791, 0.1967068314552307, 0.15009185671806335, -0.014761583879590034, -0.0678885355591774, -0.22851940989494324, 0.007818772457540035, 0.012614989653229713, 0.4396161437034607, 0.16732947528362274, -0.11722132563591003, 0.22061890363693237],
     [0.3245624005794525, -0.16596823930740356, -0.37555015087127686, 0.10839072614908218, 0.05614854767918587, -0.16937614977359772, 0.1441095620393753, 0.1379358321428299, ...],
     ...
   ]
 >}
josevalim commented 5 months ago

So I have no other ideas, sorry :)

christianjgreen commented 5 months ago

No problem, and thanks again for all the help! I'm going to keep working on this and https://github.com/elixir-nx/nx/issues/1027#issuecomment-2143049605

Have a great day !

💚 💙 💜 💛 ❤️

jonatanklosko commented 5 months ago

@christianjgreen I think you still want to set the env var beforehand, export it in your terminal session where you start iex/livebook (or in ~/.livebookdesktop.sh, in case of Livebook Desktop).

josevalim commented 2 months ago

Closing this if this is still something we can do to improve it!