Open jczaja opened 1 year ago
I am so sorry I missed this!
To make this into something useful, set the env var GORDON_NM
to something big before running. The default is 500 so you'll want something a lot bigger. Once you do that, run the test suite and it should take a while.
@beckermr Hi, thanks for helpful hint. I 'm just resuming this task. I was told that functionalities tested in those Unit tests are to be part of bigger workload. So do you know which element out of those three (test_gordon.py) is taking more execution time than other in this bigger workload? That info would help me to prioritize optimization efforts.
Anyway, As soon as we have something improved we will keep you posted. Thanks
I do not have an estimate for this.
I have started to do some profiling of those unit tests, but majority of time is spent on single-threaded CPU rather than on XPU(Ponteveccio). So I'm looking if python code could be easily run on XPU(via JAX) rather than using regular numpy(CPU). This PR : #4 will shorten initialization time of test so I can easily see performance of other areas of code. Please review
Done and thank you!
@beckermr What is typical GORDON_NM value used in target workload? I'm asking as I should be looking at performance optimization of functionality as close as to your target model as possible.
As big as we can without overflowing the device memory.
Hi,
My name is Jacek Czaja and I'm on of Intel engineers to help with having JAX projects running efficiently on Intel devices. I was given this repo link (among others) to have its content enabled on JAX with Intel GPU and optimized for performance. I was able to run unit tests on Intel GPU, but I'm struggling to do benchmarking of this gordon functionality.
What do I need? I would like to do bench-marking of functionality of this repository on Intel HW. In order to do so I need representative(something that is close to real use case that should be optimized) example of usage of gordon functionality that runs at least a minute , so I can take a look at bottlenecks and try to optimize them. Please point me to such an example of gordon functionality.