Closed aramallo closed 6 months ago
The number EXLA.Client.get_supported_platforms()
gives for :host
is the number of CPU cores, but in practice there's only once device. You can force more devices with XLA_FLAGS=--xla_force_host_platform_device_count=10
, but that's meant for testing, because XLA should already use more cores whenever appropriate. Multiple devices is relevant when you literally have multiple GPU devices attached to the machine :)
Thanks @jonatanklosko thats very clear.
My naive idea was to run N servings of a SentenceTransformer, where N was the number of cores. Do you mind if I ask here what would be the best way to do that? Say I want to compute embeddings for a list of inputs using those cores.
Thanks!
The general answer from XLA is that it is best to not do that and instead pass a batch of sentences that will be processed in parallel using all cores by default.
That's the theory but XLA for CPU is not as fast as it could be nor it uses all cores all the time (most optimizations are done when running on the GPU). You could set XLA_FLAGS=--xla_force_host_platform_device_count=10
and then you would have multiple devices, which you could run with multiple servings, but in my experience that doesn't make a difference.
Btw, thank you for all work on Partisan and that collection of libraries. ❤️ If there is anything we can help with, don't hesitate to ask (we are also on the #machine-learning channel of the Erlang Ecosystem Foundation Slack).
@josevalim Thank you so much for the answer and the kind words!! I will move future discussions to the EEF slack. I recently started working with Elixir and these libraries and I am completely blown away, so thanks to you and team for an extraordinary work.
Hi,
I am starting to learn Nx and friends. I was searching for a function to determine how many client devices I have available and found
EXLA.Client.get_supported_platforms()
which returns a map of clients and (supposedly) number of devices.However, if I try to create an NX.Serving with
device_id
> 0 I get an error saying there is no such device.Then looking at the implementation I found a call to
EXLA.NIF.get_device_count/1
which returns1
as result.Is the difference between these two results correct? Am I misunderstanding the return of
EXLA.Client.get_supported_platforms()
?