Closed HiromuHota closed 4 years ago
https://github.com/HiromuHota/fonduer-tutorials/runs/883626478 demonstrates the hardware tutorial stalls at featurizer.get_feature_matrices(train_cands)
(right after featurizer.apply(split=0, train=True, parallelism=PARALLEL)
) for about 20 minutes.
2020-07-17T22:46:57.5633398Z CPU times: user 21min 12s, sys: 3.23 s, total: 21min 16s
2020-07-17T22:46:57.5633521Z Wall time: 21min 24s
2020-07-17T22:46:57.5633638Z (28935, 27732)
Demonstrated by https://github.com/HiromuHota/fonduer-tutorials/runs/883723629, the proposed fix #484 can reduce the elapsed time to about 1 min.
Fri, 17 Jul 2020 23:21:41 GMT CPU times: user 49.5 s, sys: 3.21 s, total: 52.8 s
Fri, 17 Jul 2020 23:21:41 GMT Wall time: 1min 2s
Fri, 17 Jul 2020 23:21:46 GMT (28935, 27735)
Description of the bug
featurizer.get_feature_matrices(train_cands)
does not return within a bearable amount of time.To Reproduce
Steps to reproduce the behavior:
Expected behavior
featurizer.get_feature_matrices(train_cands)
returns about 1 min or so for hardware tutorial.Error Logs/Screenshots
When I forcefully cancel the operation, the following stack trace is shown.
Environment (please complete the following information)
Additional context
I think this is a regression caused by #407. Also this issue is not noticeable unless the feature vector is big (like 30000 dimensions).