Closed antoinecarme closed 6 years ago
Running a profiler shows that the training process spends a lot of time normalize the date column. It is possible to cache this function values through the training process, which leads to a significant speed improvement
Profiler command :
python3 -m cProfile tests/func/test_ozone.py
sample output :
INFO:pyaf.std:END_TRAINING_TIME_IN_SECONDS 'Ozone' 3.797365665435791 ... ... 1762583 function calls (1738911 primitive calls) in 5.604 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.000 0.000 <decorator-gen-0>:1(<module>) 1 0.000 0.000 0.000 0.000 <decorator-gen-10>:1(<module>) .... 1 0.000 0.000 0.000 0.000 Time.py:14(cTimeInfo) 14 0.000 0.000 0.012 0.001 Time.py:159(cutFrame) 4 0.000 0.000 0.003 0.001 Time.py:16(__init__) 216 0.000 0.000 0.000 0.000 Time.py:218(isOneRowDataset) 216 0.012 0.000 0.013 0.000 Time.py:300(normalizeTime) 25 0.000 0.000 0.000 0.000 Time.py:306(addMonths) 25 0.001 0.000 0.002 0.000 Time.py:311(nextTime) 1 0.000 0.000 0.000 0.000 Time.py:39(info) 1 0.000 0.000 0.001 0.001 Time.py:51(to_json) 1 0.000 0.000 0.004 0.004 Time.py:59(addVars) 25 0.000 0.000 0.000 0.000 Time.py:66(get_time_dtype) 1 0.000 0.000 0.000 0.000 Time.py:7(<module>) 25 0.000 0.000 0.003 0.000 Time.py:76(checkDateAndSignalTypesForNewDataset) sampel output :
Fix : add a memoization decoration around this function.
No test impact is expected (except on training time ;)
Fixed.
Running a profiler shows that the training process spends a lot of time normalize the date column. It is possible to cache this function values through the training process, which leads to a significant speed improvement
Profiler command :
sample output :