When I tried to calculate_feature_matrix by chunks, I kept encountering ValueError, following which a Fatal Python error usually occurred. Please note this error only occured after some chunks calculation, and no error showed up if I continued from where it failed with restarting the python script. Please see below for full trace info.
2022-11-10 15:31:53,351 - distributed.worker_memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker-memory.html#memory-not-released-back-to-the-os for more information. -- Unmanaged memory: 3.04 GiB -- Worker memory limit: 3.79 GiB
Traceback (most recent call last):
File "/home/zzz/python/test.py", line 306, in ft_test
feature_matrix_ = ft.calculate_feature_matrix(
File "/home/zzz/.conda/envs/test/lib/python3.9/site-packages/featuretools/computational_backends/calculate_feature_matrix.py", line 316, in calculate_feature_matrix
feature_matrix = parallel_calculate_chunks(
File "/home/zzz/.conda/envs/test/lib/python3.9/site-packages/featuretools/computational_backends/calculate_feature_matrix.py", line 792, in parallel_calculate_chunks
client.replicate([_es, _saved_features])
File "/home/zzz/.conda/envs/test/lib/python3.9/site-packages/distributed/client.py", line 3481, in replicate
return self.sync(
File "/home/zzz/.conda/envs/test/lib/python3.9/site-packages/distributed/utils.py", line 338, in sync
return sync(
File "/home/zzz/.conda/envs/test/lib/python3.9/site-packages/distributed/utils.py", line 405, in sync
raise exc.with_traceback(tb)
File "/home/zzz/.conda/envs/test/lib/python3.9/site-packages/distributed/utils.py", line 378, in f
result = yield future
File "/home/zzz/.conda/envs/test/lib/python3.9/site-packages/tornado/gen.py", line 762, in run
value = future.result()
File "/home/zzz/.conda/envs/test/lib/python3.9/site-packages/distributed/client.py", line 3439, in _replicate
await self.scheduler.replicate(
File "/home/zzz/.conda/envs/test/lib/python3.9/site-packages/distributed/core.py", line 1153, in send_recv_from_rpc
return await send_recv(comm=comm, op=key, **kwargs)
File "/home/zzz/.conda/envs/test/lib/python3.9/site-packages/distributed/core.py", line 943, in send_recv
raise exc.with_traceback(tb)
File "/home/zzz/.conda/envs/test/lib/python3.9/site-packages/distributed/core.py", line 769, in _handle_comm
result = await result
File "/home/zzz/.conda/envs/test/lib/python3.9/site-packages/distributed/scheduler.py", line 5781, in replicate
for ws in random.sample(tuple(workers - ts.who_has), count):
File "/home/zzz/.conda/envs/test/lib/python3.9/random.py", line 449, in sample
raise ValueError("Sample larger than population or is negative")
ValueError: Sample larger than population or is negative
2022-11-10 15:31:53,461 - distributed.worker_memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker-memory.html#memory-not-released-back-to-the-os for more information. -- Unmanaged memory: 2.99 GiB -- Worker memory limit: 3.79 GiB
Exception in thread AsyncProcess Dask Worker process (from Nanny) watch process join:
Traceback (most recent call last):
File "/home/zzz/.conda/envs/test/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/home/zzz/.conda/envs/test/lib/python3.9/threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "/home/zzz/.conda/envs/test/lib/python3.9/site-packages/distributed/process.py", line 236, in _watch_process
assert exitcode is not None
AssertionError
Exception in thread AsyncProcess Dask Worker process (from Nanny) watch process join:
Traceback (most recent call last):
File "/home/zzz/.conda/envs/test/lib/python3.9/threading.py", line 980, in _bootstrap_inner
Using EntitySet persisted on the cluster as dataset EntitySet-a3d41f24f216a89dd794828f2871b580
self.run()
Fatal Python error: _enter_buffered_busy: could not acquire lock for <_io.BufferedWriter name=''> at interpreter shutdown, possibly due to daemon threads
Python runtime state: finalizing (tstate=0x17a50a0)
Current thread 0x00007f992d262280 (most recent call first):
Also, something that may be relevant with previous fatal error, there're tons of fragmented and Unmanaged memory use warning in the log:
/home/zzz/.conda/envs/test/lib/python3.9/site-packages/featuretools/computational_backends/feature_set_calculator.py:938: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
return data.assign(**new_cols)
2022-11-11 09:39:14,505 - distributed.worker_memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker-memory.html#memory-not-released-back-to-the-os for more information. -- Unmanaged memory: 3.15 GiB -- Worker memory limit: 3.79 GiB
Any ideas would be highly appreciated! Best regard!
When I tried to calculate_feature_matrix by chunks, I kept encountering ValueError, following which a Fatal Python error usually occurred. Please note this error only occured after some chunks calculation, and no error showed up if I continued from where it failed with restarting the python script. Please see below for full trace info.
Also, something that may be relevant with previous fatal error, there're tons of fragmented and Unmanaged memory use warning in the log:
Any ideas would be highly appreciated! Best regard!