Closed jdebacker closed 5 months ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 75.35%. Comparing base (
300912a
) to head (d967a83
).
@jdebacker. FYI, I pushed one little commit to the end of CHANGELOG.md
on your branch. I've gotten all the GH Action CI tests to pass except for Mac Python 3.9. I keep getting a Dask error of one of the workers quitting on the second-to-last test of test_get_micro_data.py::test_get_data[Reform]
.
All the tests in test_get_micro_data.py
passed on my local machine, including the tests marked local
. But my local machine is a Mac, and the ogusa-dev
conda environment has Python 3.11, which is not tested in our CI.
@jdebacker. The traceback for this error in test_get_micro_data.py::test_get_data[Reform]
is the following:
Run python -m pytest -m "not local" --cov=./ --cov-report=xml
============================= test session starts ==============================
platform darwin -- Python 3.9.19, pytest-8.[2](https://github.com/PSLmodels/OG-USA/actions/runs/9004787859/job/24740433767?pr=108#step:5:2).0, pluggy-1.5.0
rootdir: /Users/runner/work/OG-USA/OG-USA
configfile: pytest.ini
testpaths: ./tests
plugins: cov-5.0.0, xdist-[3](https://github.com/PSLmodels/OG-USA/actions/runs/9004787859/job/24740433767?pr=108#step:5:3).5.0
collected 37 items / 7 deselected / 30 selected
tests/test_calibrate.py .... [ 13%]
202[4](https://github.com/PSLmodels/OG-USA/actions/runs/9004787859/job/24740433767?pr=108#step:5:4)-05-08 16:4[5](https://github.com/PSLmodels/OG-USA/actions/runs/9004787859/job/24740433767?pr=108#step:5:5):06,271 - distributed.worker.state_machine - WARNING - Async instruction for <Task cancelled name="execute('taxcalc_advance-7d35ab75-dabd-4ece-957d-0957879a2ccc')" coro=<Worker.execute() done, defined at /Users/runner/miniconda3/envs/ogusa-dev/lib/python3.9/site-packages/distributed/worker_state_machine.py:3[6](https://github.com/PSLmodels/OG-USA/actions/runs/9004787859/job/24740433767?pr=108#step:5:6)15>> ended with CancelledError
tests/test_get_micro_data.py .........F. [ 50%]
tests/test_income.py ........... [ 86%]
tests/test_psid_data_setup.py . [ 90%]
tests/test_utils.py . [ 93%]
tests/test_wealth.py .. [100%]
=================================== FAILURES ===================================
____________________________ test_get_data[Reform] _____________________________
baseline = False
dask_client = <Client: 'tcp://12[7](https://github.com/PSLmodels/OG-USA/actions/runs/9004787859/job/24740433767?pr=108#step:5:7).0.0.1:49223' processes=2 threads=4, memory=4.67 GiB>
@pytest.mark.parametrize("baseline", [True, False], ids=["Baseline", "Reform"])
def test_get_data(baseline, dask_client):
"""
Test of get_micro_data.get_data() function
Note that this test may fail if the Tax-Calculator is not v 3.2.2
"""
expected_data = utils.safe_read_pickle(
os.path.join(CUR_PATH, "test_io_data", "micro_data_dict_for_tests.pkl")
)
> test_data, _ = get_micro_data.get_data(
baseline=baseline,
start_year=2031,
iit_reform={},
data="cps",
client=dask_client,
num_workers=NUM_WORKERS,
)
tests/test_get_micro_data.py:20[8](https://github.com/PSLmodels/OG-USA/actions/runs/9004787859/job/24740433767?pr=108#step:5:9):
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
ogusa/get_micro_data.py:148: in get_data
results = client.gather(futures)
../../../miniconda3/envs/ogusa-dev/lib/python3.[9](https://github.com/PSLmodels/OG-USA/actions/runs/9004787859/job/24740433767?pr=108#step:5:10)/site-packages/distributed/client.py:2372: in gather
return self.sync(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <Client: 'tcp://127.0.0.1:49223' processes=2 threads=4, memory=4.67 GiB>
futures = [<Future: lost, type: dict, key: taxcalc_advance-7d35ab75-dabd-4ece-957d-0957879a2ccc>, <Future: error, key: taxcalc_a...ce-7b151cc7-c9ac-42c3-a774-cbb782c9e6f8>, <Future: pending, key: taxcalc_advance-2f6ef7e5-b75c-4440-b031-feec5de5dfcf>]
errors = 'raise', direct = False, local_worker = None
async def _gather(self, futures, errors="raise", direct=None, local_worker=None):
unpacked, future_set = unpack_remotedata(futures, byte_keys=True)
mismatched_futures = [f for f in future_set if f.client is not self]
if mismatched_futures:
raise ValueError(
"Cannot gather Futures created by another client. "
f"These are the {len(mismatched_futures)} (out of {len(futures)}) "
f"mismatched Futures and their client IDs (this client is {self.id}): "
f"{ {f: f.client.id for f in mismatched_futures} }" # noqa: E201, E202
)
keys = [future.key for future in future_set]
bad_data = dict()
data = {}
if direct is None:
direct = self.direct_to_workers
if direct is None:
try:
w = get_worker()
except Exception:
direct = False
else:
if w.scheduler.address == self.scheduler.address:
direct = True
async def wait(k):
"""Want to stop the All(...) early if we find an error"""
try:
st = self.futures[k]
except KeyError:
raise AllExit()
else:
await st.wait()
if st.status != "finished" and errors == "raise":
raise AllExit()
while True:
logger.debug("Waiting on futures to clear before gather")
with suppress(AllExit):
await distributed.utils.All(
[wait(key) for key in keys if key in self.futures],
quiet_exceptions=AllExit,
)
failed = ("error", "cancelled")
exceptions = set()
bad_keys = set()
for key in keys:
if key not in self.futures or self.futures[key].status in failed:
exceptions.add(key)
if errors == "raise":
try:
st = self.futures[key]
exception = st.exception
traceback = st.traceback
except (KeyError, AttributeError):
exc = CancelledError(key)
else:
> raise exception.with_traceback(traceback)
E distributed.scheduler.KilledWorker: Attempted to run task 'taxcalc_advance-f47d516e-0d21-4d41-be0e-7d93d8be05fc' on 4 different workers, but all those workers died while running it. The last worker that attempt to run the task was tcp://127.0.0.1:49266. Inspecting worker logs is often a good next step to diagnose what went wrong. For more information see https://distributed.dask.org/en/stable/killed.html.
../../../miniconda3/envs/ogusa-dev/lib/python3.9/site-packages/distributed/client.py:2232: KilledWorker
----------------------------- Captured stderr call -----------------------------
2024-05-08 16:42:21,876 - distributed.worker.memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker-memory.html#memory-not-released-back-to-the-os for more information. -- Unmanaged memory: 1.76 GiB -- Worker memory limit: 2.33 GiB
2024-05-08 16:42:27,405 - distributed.worker.memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker-memory.html#memory-not-released-back-to-the-os for more information. -- Unmanaged memory: 1.68 GiB -- Worker memory limit: 2.33 GiB
2024-05-08 16:42:35,874 - distributed.worker.memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker-memory.html#memory-not-released-back-to-the-os for more information. -- Unmanaged memory: 1.65 GiB -- Worker memory limit: 2.33 GiB
2024-05-08 16:42:45,901 - distributed.worker.memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker-memory.html#memory-not-released-back-to-the-os for more information. -- Unmanaged memory: 1.79 GiB -- Worker memory limit: 2.33 GiB
2024-05-08 16:42:52,863 - distributed.worker.memory - WARNING - Worker is at 80% memory usage. Pausing worker. Process memory: 1.88 GiB -- Worker memory limit: 2.33 GiB
2024-05-08 16:42:53,033 - distributed.worker.memory - WARNING - Worker is at 69% memory usage. Resuming worker. Process memory: 1.62 GiB -- Worker memory limit: 2.33 GiB
2024-05-08 16:42:55,968 - distributed.worker.memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker-memory.html#memory-not-released-back-to-the-os for more information. -- Unmanaged memory: 1.67 GiB -- Worker memory limit: 2.33 GiB
2024-05-08 16:42:56,666 - distributed.worker.memory - WARNING - Worker is at 80% memory usage. Pausing worker. Process memory: 1.87 GiB -- Worker memory limit: 2.33 GiB
2024-05-08 16:42:56,967 - distributed.worker.memory - WARNING - Worker is at 76% memory usage. Resuming worker. Process memory: 1.79 GiB -- Worker memory limit: 2.33 GiB
2024-05-08 16:43:03,868 - distributed.worker.memory - WARNING - Worker is at 80% memory usage. Pausing worker. Process memory: 1.87 GiB -- Worker memory limit: 2.33 GiB
2024-05-08 16:43:09,847 - distributed.worker.memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker-memory.html#memory-not-released-back-to-the-os for more information. -- Unmanaged memory: 1.67 GiB -- Worker memory limit: 2.33 GiB
2024-05-08 16:43:11,903 - distributed.worker.memory - WARNING - Worker is at 80% memory usage. Pausing worker. Process memory: 1.87 GiB -- Worker memory limit: 2.33 GiB
2024-05-08 16:43:11,949 - distributed.worker.memory - WARNING - Worker is at 79% memory usage. Resuming worker. Process memory: 1.84 GiB -- Worker memory limit: 2.33 GiB
2024-05-08 16:43:12,246 - distributed.worker.memory - WARNING - Worker is at 80% memory usage. Pausing worker. Process memory: 1.87 GiB -- Worker memory limit: 2.33 GiB
2024-05-08 16:43:12,553 - distributed.worker.memory - WARNING - Worker is at 77% memory usage. Resuming worker. Process memory: 1.80 GiB -- Worker memory limit: 2.33 GiB
2024-05-08 16:43:13,309 - distributed.worker.memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker-memory.html#memory-not-released-back-to-the-os for more information. -- Unmanaged memory: 1.65 GiB -- Worker memory limit: 2.33 GiB
2024-05-08 16:43:19,948 - distributed.worker.memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker-memory.html#memory-not-released-back-to-the-os for more information. -- Unmanaged memory: 1.65 GiB -- Worker memory limit: 2.33 GiB
2024-05-08 16:43:23,458 - distributed.worker.memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker-memory.html#memory-not-released-back-to-the-os for more information. -- Unmanaged memory: 1.64 GiB -- Worker memory limit: 2.33 GiB
2024-05-08 16:43:26,320 - distributed.worker.memory - WARNING - Worker is at 81% memory usage. Pausing worker. Process memory: 1.89 GiB -- Worker memory limit: 2.33 GiB
2024-05-08 16:43:33,866 - distributed.worker.memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker-memory.html#memory-not-released-back-to-the-os for more information. -- Unmanaged memory: 1.49 GiB -- Worker memory limit: 2.33 GiB
2024-05-08 16:43:36,648 - distributed.worker.memory - WARNING - Worker is at 89% memory usage. Pausing worker. Process memory: 2.09 GiB -- Worker memory limit: 2.33 GiB
2024-05-08 16:43:51,880 - distributed.worker.memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker-memory.html#memory-not-released-back-to-the-os for more information. -- Unmanaged memory: 1.72 GiB -- Worker memory limit: 2.33 GiB
2024-05-08 16:43:52,675 - distributed.worker.memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker-memory.html#memory-not-released-back-to-the-os for more information. -- Unmanaged memory: 1.74 GiB -- Worker memory limit: 2.33 GiB
2024-05-08 16:44:01,925 - distributed.worker.memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker-memory.html#memory-not-released-back-to-the-os for more information. -- Unmanaged memory: 1.74 GiB -- Worker memory limit: 2.33 GiB
2024-05-08 16:44:03,763 - distributed.worker.memory - WARNING - Worker is at 81% memory usage. Pausing worker. Process memory: 1.91 GiB -- Worker memory limit: 2.33 GiB
2024-05-08 16:44:13,740 - distributed.worker.memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker-memory.html#memory-not-released-back-to-the-os for more information. -- Unmanaged memory: 1.74 GiB -- Worker memory limit: 2.33 GiB
2024-05-08 16:44:14,064 - distributed.worker.memory - WARNING - Worker is at 82% memory usage. Pausing worker. Process memory: 1.93 GiB -- Worker memory limit: 2.33 GiB
2024-05-08 16:44:14,543 - distributed.worker.memory - WARNING - Worker is at 78% memory usage. Resuming worker. Process memory: 1.84 GiB -- Worker memory limit: 2.33 GiB
2024-05-08 16:44:14,640 - distributed.worker.memory - WARNING - Worker is at 80% memory usage. Pausing worker. Process memory: 1.89 GiB -- Worker memory limit: 2.33 GiB
2024-05-08 16:44:14,899 - distributed.worker.memory - WARNING - Worker is at 71% memory usage. Resuming worker. Process memory: 1.67 GiB -- Worker memory limit: 2.33 GiB
2024-05-08 16:44:14,977 - distributed.worker.memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker-memory.html#memory-not-released-back-to-the-os for more information. -- Unmanaged memory: 1.74 GiB -- Worker memory limit: 2.33 GiB
2024-05-08 16:44:15,287 - distributed.worker.memory - WARNING - Worker is at 80% memory usage. Pausing worker. Process memory: 1.87 GiB -- Worker memory limit: 2.33 GiB
------------------------------ Captured log call -------------------------------
WARNING distributed.nanny.memory:worker_memory.py:425 Worker tcp://127.0.0.1:49233 (pid=11391) exceeded 95% memory budget. Restarting...
INFO distributed.nanny:nanny.py:804 Worker process 11391 was killed by signal 15
INFO distributed.core:core.py:[10](https://github.com/PSLmodels/OG-USA/actions/runs/9004787859/job/24740433767?pr=108#step:5:11)29 Connection to tcp://127.0.0.1:49237 has been closed.
INFO distributed.scheduler:scheduler.py:5204 Remove worker <WorkerState 'tcp://127.0.0.1:49233', name: 1, status: paused, memory: 0, processing: 2> (stimulus_id='handle-worker-cleanup-1715186585.977865')
WARNING distributed.nanny:nanny.py:566 Restarting worker
INFO distributed.scheduler:scheduler.py:4424 Register worker <WorkerState 'tcp://127.0.0.1:49260', name: 1, status: init, memory: 0, processing: 0>
INFO distributed.scheduler:scheduler.py:5929 Starting worker compute stream, tcp://127.0.0.1:49260
INFO distributed.core:core.py:1019 Starting established connection to tcp://127.0.0.1:49262
WARNING distributed.nanny.memory:worker_memory.py:425 Worker tcp://127.0.0.1:49234 (pid=[11](https://github.com/PSLmodels/OG-USA/actions/runs/9004787859/job/24740433767?pr=108#step:5:12)392) exceeded 95% memory budget. Restarting...
INFO distributed.nanny:nanny.py:804 Worker process 11392 was killed by signal 15
INFO distributed.core:core.py:1029 Connection to tcp://[12](https://github.com/PSLmodels/OG-USA/actions/runs/9004787859/job/24740433767?pr=108#step:5:13)7.0.0.1:49238 has been closed.
INFO distributed.scheduler:scheduler.py:5204 Remove worker <WorkerState 'tcp://127.0.0.1:49234', name: 2, status: paused, memory: 1, processing: 2> (stimulus_id='handle-worker-cleanup-1715186607.671854')
WARNING distributed.scheduler:scheduler.py:5280 Removing worker 'tcp://127.0.0.1:49234' caused the cluster to lose already computed task(s), which will be recomputed elsewhere: {'taxcalc_advance-7d35ab75-dabd-4ece-957d-0957879a2ccc'} (stimulus_id='handle-worker-cleanup-1715186607.671854')
WARNING distributed.nanny:nanny.py:566 Restarting worker
INFO distributed.scheduler:scheduler.py:4424 Register worker <WorkerState 'tcp://127.0.0.1:49266', name: 2, status: init, memory: 0, processing: 0>
INFO distributed.scheduler:scheduler.py:5929 Starting worker compute stream, tcp://127.0.0.1:49266
INFO distributed.core:core.py:1019 Starting established connection to tcp://127.0.0.1:49268
WARNING distributed.nanny.memory:worker_memory.py:425 Worker tcp://127.0.0.1:49251 (pid=[13](https://github.com/PSLmodels/OG-USA/actions/runs/9004787859/job/24740433767?pr=108#step:5:14)099) exceeded 95% memory budget. Restarting...
INFO distributed.nanny:nanny.py:804 Worker process 13099 was killed by signal 15
INFO distributed.core:core.py:1029 Connection to tcp://127.0.0.1:49253 has been closed.
INFO distributed.scheduler:scheduler.py:5204 Remove worker <WorkerState 'tcp://127.0.0.1:49251', name: 0, status: paused, memory: 1, processing: 2> (stimulus_id='handle-worker-cleanup-1715186622.4045131')
WARNING distributed.scheduler:scheduler.py:5280 Removing worker 'tcp://127.0.0.1:49251' caused the cluster to lose already computed task(s), which will be recomputed elsewhere: {'taxcalc_advance-7b151cc7-c9ac-42c3-a774-cbb782c9e6f8'} (stimulus_id='handle-worker-cleanup-1715186622.4045131')
WARNING distributed.nanny:nanny.py:566 Restarting worker
INFO distributed.scheduler:scheduler.py:4424 Register worker <WorkerState 'tcp://127.0.0.1:49272', name: 0, status: init, memory: 0, processing: 0>
INFO distributed.scheduler:scheduler.py:5929 Starting worker compute stream, tcp://127.0.0.1:49272
INFO distributed.core:core.py:1019 Starting established connection to tcp://127.0.0.1:49274
WARNING distributed.nanny.memory:worker_memory.py:425 Worker tcp://127.0.0.1:49260 (pid=[14](https://github.com/PSLmodels/OG-USA/actions/runs/9004787859/job/24740433767?pr=108#step:5:15)391) exceeded 95% memory budget. Restarting...
INFO distributed.core:core.py:1029 Connection to tcp://127.0.0.1:49262 has been closed.
INFO distributed.scheduler:scheduler.py:5204 Remove worker <WorkerState 'tcp://127.0.0.1:49260', name: 1, status: paused, memory: 0, processing: 2> (stimulus_id='handle-worker-cleanup-17[15](https://github.com/PSLmodels/OG-USA/actions/runs/9004787859/job/24740433767?pr=108#step:5:16)186646.0573041')
INFO distributed.nanny:nanny.py:804 Worker process 14391 was killed by signal 15
WARNING distributed.nanny:nanny.py:566 Restarting worker
INFO distributed.scheduler:scheduler.py:4424 Register worker <WorkerState 'tcp://127.0.0.1:49277', name: 1, status: init, memory: 0, processing: 0>
INFO distributed.scheduler:scheduler.py:5929 Starting worker compute stream, tcp://127.0.0.1:49277
INFO distributed.core:core.py:1019 Starting established connection to tcp://127.0.0.1:49279
WARNING distributed.nanny.memory:worker_memory.py:425 Worker tcp://127.0.0.1:49266 (pid=14739) exceeded 95% memory budget. Restarting...
INFO distributed.core:core.py:1029 Connection to tcp://127.0.0.1:49268 has been closed.
INFO distributed.scheduler:scheduler.py:5204 Remove worker <WorkerState 'tcp://127.0.0.1:49266', name: 2, status: paused, memory: 0, processing: 2> (stimulus_id='handle-worker-cleanup-[17](https://github.com/PSLmodels/OG-USA/actions/runs/9004787859/job/24740433767?pr=108#step:5:18)15186656.506249')
ERROR distributed.scheduler:scheduler.py:5259 Task taxcalc_advance-f47d516e-0d21-4d41-be0e-7d93d8be05fc marked as failed because 4 workers died while trying to run it
INFO distributed.nanny:nanny.py:804 Worker process 14739 was killed by signal 15
WARNING distributed.nanny:nanny.py:566 Restarting worker
=========================== short test summary info ============================
FAILED tests/test_get_micro_data.py::test_get_data[Reform] - distributed.scheduler.KilledWorker: Attempted to run task 'taxcalc_advance-f47d516e-0d21-4d41-be0e-7d93d8be05fc' on 4 different workers, but all those workers died while running it. The last worker that attempt to run the task was tcp://127.0.0.1:49266. Inspecting worker logs is often a good next step to diagnose what went wrong. For more information see https://distributed.dask.org/en/stable/killed.html.
===== 1 failed, 29 passed, 7 deselected, 66 warnings in 536.37s (0:08:56) ======
@jdebacker. I recommend that we stop testing Python 3.9 in build_and_test.yml
and only test Python 3.10 and 3.11, as we do in OG-Core (see OG-Core build_and_test.yml
line 29). I submitted a PR to your branch that makes these changes.
@jdebacker. Looks great. Thanks for this. Merging now.
This PR prepares the release of OG-USA version 0.1.6.