dask / distributed

A distributed task scheduler for Dask
https://distributed.dask.org
BSD 3-Clause "New" or "Revised" License
1.56k stars 715 forks source link

Split up test_client.py #6043

Open fjetter opened 2 years ago

fjetter commented 2 years ago

distributed/tests/test_client.py is currently by far our largest test file with close to 5.5k lines of code with 463 tests. We often require subsets of these tests but running the entire file is very slow and selection of subsets is typically very difficult (there are things like pytest -k but this is only useful to a certain point).

I would like for us to split this file up to reduce iteration times.

Top 5

cloc distributed/tests --by-file
      63 text files.
      63 unique files.
      10 files ignored.

github.com/AlDanial/cloc v 1.90  T=0.13 s (434.2 files/s, 241569.7 lines/s)
--------------------------------------------------------------------------------------------------
File                                                           blank        comment           code
--------------------------------------------------------------------------------------------------
distributed/tests/test_client.py                                1947            154           5486
distributed/tests/test_scheduler.py                              792            217           2531
distributed/tests/test_worker.py                                 749            316           2243
distributed/tests/test_steal.py                                  268             62            911
distributed/tests/test_active_memory_manager.py                  175            149            748

In terms of its content, there are various different functionalities tested, e.g.

This list is not exhaustive and we may decide to not split like this. This is just to make a point that this file covers a lot of different aspects of our code.

List of tests in test_client.py ``` Test of Test Client.rebalance(). These are just to test the Client wrapper around Scheduler.rebalance(); for more thorough tests on the latter see test_scheduler.py. Test Client.rebalance(). These are just to test the Client wrapper around Scheduler.rebalance(); for more thorough tests on the latter see test_scheduler.py. Client.rebalance() internally waits for unfinished futures rebalance() raises KeyError if explicitly listed futures disappear Create irreplaceable data on one machine, cause a dependent computation to occur on another and complete Kill the machine with the irreplaceable data. What happens to the complete result? How about after it GCs and tries to come back? Create irreplaceable data on one machine, cause a dependent computation to occur on another and complete Kill the machine with the irreplaceable data. What happens to the complete result? How about after it GCs and tries to come back? Ensure that tasks scheduled from a seceded thread can be scheduled elsewhere Regression test of Ensure that logs from all server types (scheduler, worker, nanny) and the clients themselves arrive Test that if a security loader is configured, but it returns `None`, then the default security configuration is used ```
mrocklin commented 2 years ago

FWIW I'm fine splitting it up

On Fri, Apr 1, 2022 at 4:53 AM Florian Jetter @.***> wrote:

distributed/tests/test_client.py is currently by far our largest test file with close to 5.5k lines of code with 463 tests. We often require subsets of these tests but running the entire file is very slow and selection of subsets is typically very difficult (there are things like pytest -k but this is only useful to a certain point).

I would like for us to split this file up to reduce iteration times.

Top 5

cloc distributed/tests --by-file 63 text files. 63 unique files. 10 files ignored. github.com/AlDanial/cloc v 1.90 T=0.13 s (434.2 files/s, 241569.7 lines/s)

File blank comment code

distributed/tests/test_client.py 1947 154 5486 distributed/tests/test_scheduler.py 792 217 2531 distributed/tests/test_worker.py 749 316 2243 distributed/tests/test_steal.py 268 62 911 distributed/tests/test_active_memory_manager.py 175 149 748

In terms of its content, there are various different functionalities tested, e.g.

  • Core client functionality like submit/map/persist/sync/gather
  • Client utility functions like call_stack, fire_and_forget, get_versions, as_completed
  • Multi client / default client functionality
  • Some scheduler decision making around balanced task distribution, host/resource restrictions, occupancies
  • Futures (part of client module)
  • A lot of scattering and replication
  • File uploads
  • Some annotation stuff
  • Cluster dump
  • Code instrumentation for Computation class

This list is not exhaustive and we may decide to not split like this. This is just to make a point that this file covers a lot of different aspects of our code. List of tests in test_client.py

<Module distributed/tests/test_client.py>

Test of Test Client.rebalance(). These are just to test the Client wrapper around Scheduler.rebalance(); for more thorough tests on the latter see test_scheduler.py. Test Client.rebalance(). These are just to test the Client wrapper around Scheduler.rebalance(); for more thorough tests on the latter see test_scheduler.py. Client.rebalance() internally waits for unfinished futures rebalance() raises KeyError if explicitly listed futures disappear Create irreplaceable data on one machine, cause a dependent computation to occur on another and complete Kill the machine with the irreplaceable data. What happens to the complete result? How about after it GCs and tries to come back? Create irreplaceable data on one machine, cause a dependent computation to occur on another and complete Kill the machine with the irreplaceable data. What happens to the complete result? How about after it GCs and tries to come back? Ensure that tasks scheduled from a seceded thread can be scheduled elsewhere Regression test of Ensure that logs from all server types (scheduler, worker, nanny) and the clients themselves arrive Test that if a security loader is configured, but it returns `None`, then the default security configuration is used — Reply to this email directly, view it on GitHub , or unsubscribe . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>