Open moorepants opened 10 years ago
Calling .readlines() a file is fast, as done for the event finding. I'm not sure where all of the .readline() methods are being called.
In [7]: %time df = pandas.read_csv('/home/moorepants/Data/human-gait/gait-control-identification/T006/mocap-006.txt', delimiter='\t')
CPU times: user 1.49 s, sys: 60.8 ms, total: 1.55 s
Wall time: 1.55 s
Reading in the files is fast. And profiling doesn't show any readline() calls.
It could be this readline() call in oct2py/session.py:evaluate (line 551):
line = self.proc.stdout.readline().rstrip().decode('utf-8')
Here's profiling output of essentially just the DFlowData.clean_data() call:
1317864 function calls (1317260 primitive calls) in 62.622 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
128105 56.725 0.000 56.725 0.000 {method 'readline' of 'file' objects}
3 2.573 0.858 2.573 0.858 {method 'read' of 'pandas.parser.TextReader' objects}
120 0.560 0.005 0.562 0.005 fitpack2.py:441(__init__)
10 0.336 0.034 57.587 5.759 session.py:494(evaluate)
379 0.210 0.001 0.212 0.001 common.py:134(_isnull_ndarraylike)
128153 0.122 0.000 0.122 0.000 {_codecs.utf_8_decode}
128152 0.104 0.000 0.286 0.000 {method 'decode' of 'str' objects}
2 0.102 0.051 0.102 0.051 {scipy.signal.sigtools._linear_filter}
235 0.096 0.000 0.096 0.000 {method 'copy' of 'numpy.ndarray' objects}
835 0.091 0.000 0.091 0.000 {method 'reduce' of 'numpy.ufunc' objects}
182 0.088 0.000 0.088 0.000 {numpy.core.multiarray.putmask}
5 0.077 0.015 0.077 0.015 {pandas.algos.take_2d_axis1_float64_float64}
670 0.076 0.000 0.088 0.000 index.py:332(__getitem__)
272 0.072 0.000 0.168 0.001 series.py:709(_get_values)
120 0.070 0.001 0.070 0.001 {scipy.interpolate._fitpack._spl_}
128037 0.068 0.000 0.111 0.000 __init__.py:1343(isEnabledFor)
6 0.065 0.011 0.065 0.011 {pandas.algos.take_2d_axis0_float64_float64}
128150 0.061 0.000 0.183 0.000 utf_8.py:15(decode)
128037 0.057 0.000 0.168 0.000 __init__.py:1128(debug)
1 0.053 0.053 0.133 0.133 __init__.py:20(<module>)
2 0.048 0.024 0.054 0.027 indexing.py:100(_setitem_with_indexer)
13 0.044 0.003 0.044 0.003 {numpy.core.multiarray.concatenate}
128037 0.043 0.000 0.043 0.000 __init__.py:1329(getEffectiveLevel)
6 0.041 0.007 0.042 0.007 internals.py:2368(_stack_arrays)
90 0.039 0.000 0.072 0.001 series.py:719(where)
4 0.031 0.008 0.031 0.008 {pandas.algos.take_2d_axis0_bool_bool}
128106 0.029 0.000 0.029 0.000 {method 'rstrip' of 'str' objects}
61 0.028 0.000 0.028 0.000 {method 'write' of 'file' objects}
46 0.027 0.001 0.027 0.001 {method 'join' of 'str' objects}
1 0.025 0.025 0.035 0.035 table.py:13(<module>)
109 0.022 0.000 0.022 0.000 {method 'take' of 'numpy.ndarray' objects}
138790 0.020 0.000 0.020 0.000 {method 'append' of 'list' objects}
1 0.020 0.020 0.020 0.020 {pandas.algos.diff_2d_float64}
Pretty sure all the slow ups are all from the Octave code.
See issue #30 for details on the rtfilter.
Here is a profile of essentially running DFlowData.clean_data() + all the methods in WalkingData (including inverse dynamics compuations).
I think using readline to pull the events from the record files is a major slow down. Also all the calls to session.py are oct2py calls, which are slow. I believe the slow parts are the inverse dynamics real time filter and the soder.m file.