Open NimaSarajpoor opened 3 months ago
Attention: Patch coverage is 0%
with 118 lines
in your changes missing coverage. Please review.
Project coverage is 96.57%. Comparing base (
79096a8
) to head (f548907
). Report is 3 commits behind head on main.
Files with missing lines | Patch % | Lines |
---|---|---|
utils.py | 0.00% | 72 Missing :warning: |
njit_fastmath.py | 0.00% | 46 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@NimaSarajpoor Thank you for painstakingly going through all of the files/functions. I am wondering if we might be able to write a Python script to help notify us as to which functions are missing fastmath=True
or should be some other fastmath
variant? Or to warn us that an outer calling function has (incorrectly) fastmath=True
but is calling an inner function that cannot take non-finite values (or produces nont-finite output values)? Is it worth investigating?
@seanlaw
I think, as our first step, a scrip that just gives us that list above should be helpful. Basically, the script can search for fastmath=True
, and for any of those functions, it finds all caller functions, and so on. I think the search (for each function) can stop once we reach an outer function whose outer function does not have fastmath
flag.
I was wondering how one can accomplish this:
Or to warn us that an outer calling function has (incorrectly) fastmath=True but is calling an inner function that cannot take non-finite values (or produces nont-finite output values)? Is it worth investigating?
? This seems to be hard. Right? I mean... detecting that by just going through the function's docstring / body may not be a feasible solution. What do you think?
@NimaSarajpoor I don't know but I think we'll have to import ast
, build up some abstract syntax tree, and then traverse the nodes accordingly
[Update] After some research, I came up with the following code that can get the callers and the callee(s) in a script.
import ast
def _visit_all_child_nodes(node, lst):
for n in ast.iter_child_nodes(node):
if isinstance(n, ast.Call):
if isinstance(n.func, ast.Attribute):
name = n.func.attr
else:
name = n.func.id
lst.append(name)
_visit_all_child_nodes(n, lst)
def visit_all_child_nodes(node):
lst = []
_visit_all_child_nodes(node, lst)
return lst
def find_caller_and_callees(fpath):
tree = ast.parse(open(fpath).read())
out = {} # caller: callee(s)
for node in ast.walk(tree):
if isinstance(node, ast.FunctionDef):
out[node.name] = set(visit_all_child_nodes(node))
return out
Note1: The if-block if isinstance(node, ast.FunctionDef):
needs to be revised to only capture the functions with njit decorators.
Note2: I tried it on the following script:
# in file temp.py
def func1(x, y):
return x + y
def func2(x, y, z):
xy = func1(x, y)
out = z + xy
return z
And the output is correct:
out = find_caller_and_callees('./temp.py')
# out
{'func1': set(), 'func2': {'func1'}}
And when I tried it for stumpy/stump.py
, I noticed it captures ALL functions:
{
'_compute_diagonal': {'_shift_insert_at_index',
'dot',
'min',
'njit',
'range',
'searchsorted',
'uint64'},
'_stump': {'_compute_diagonal',
'_count_diagonal_ndist',
'_get_array_ranges',
'_merge_topk_ρI',
'abs',
'empty',
'full',
'njit',
'prange',
'range',
'sqrt'},
'stump': {'ValueError',
'_check_P',
'_stump',
'arange',
'ceil',
'check_ignore_trivial',
'check_window_size',
'column_stack',
'empty',
'int',
'min',
'mparray',
'non_normalized',
'preprocess_diagonal'}
}
That's. a huge set. This will be reduced when I check if a callee is a njit-decorated function
@seanlaw
Do you think I am on the right track here?
@NimaSarajpoor I think that looks great. Here are a few things to consider:
lst = []
, call it something like all_functions = []
.py
files?docstring.py
code. It might be worth considering if we should combine them into a single .py
file somehow? It feels like we are starting to accumulate scripts and so maybe there is a refactoring opportunity to do a utils.py
. Just thinking out loud here.
- Maybe instead of
lst = []
, call it something likeall_functions = []
- What happens to your script when the caller and callee chain are in different/multiple
.py
files?- There seems to be a little bit of overlap between this code and the
docstring.py
code. It might be worth considering if we should combine them into a single.py
file somehow? It feels like we are starting to accumulate scripts and so maybe there is a refactoring opportunity to do autils.py
. Just thinking out loud here.
Yes! Better name indeed.
The provided script captures the callee even if it is from a different script. But it does not create a chain. That part needs to be added. The problem with the output of the current script is that it does not show from what .py
file it comes. For instance, _stump
calls the function core._merge_topk_ρI
. The output includes _merge_topk_ρI
.
One solution is to create one "helper" dictionary where the key is function name, and the value is the name of its corresponding script. So, this can help us to follow the chain by going from caller to the callee, and then see what functions the callee calls, and so on.
docstring.py
to just prep my mind for now. It might help me to borrow some idea from there as well.@seanlaw
I looked at docstring.py
and borrowed the following parts to detect functions/methods :
https://github.com/TDAmeritrade/stumpy/blob/b7b355ce4a9450357ad207dd4f04fc8e8b4db100/docstring.py#L77-L85 https://github.com/TDAmeritrade/stumpy/blob/b7b355ce4a9450357ad207dd4f04fc8e8b4db100/docstring.py#L91-L96
The new version of my script (provided below) goes through each file and returns a dictionary with the following keys:
@njit
After obtaining such dictionary for each file, the script combines the information, and do some processing. The final output is a nested dictionary. One item of this dictionary is provided below as instance:
# key is filename
file_name: stump
# value is a dictionary with the following (key, value) pairs
func_names: ['_compute_diagonal', '_stump', 'stump']
is_njit: [True, True, False]
callees: [
{'core._shift_insert_at_index'},
{'stump._compute_diagonal', 'core._count_diagonal_ndist', 'core._merge_topk_ρI', 'core._get_array_ranges'},
{'stump._stump', 'core.non_normalized', 'core.preprocess_diagonal', 'core.check_window_size', 'core._check_P', 'core.check_ignore_trivial'}
]
Note 1: The callees only contain the functions that are defined in one of the .py
files in ./stumpy/
Note 2: In the each set in callees, I used the format <file_name>.<callee_function>
to just help us track easily where each callee comes from.
#!/usr/bin/env python
import ast
import pathlib
import re
def _get_callees(node, all_functions):
for n in ast.iter_child_nodes(node):
if isinstance(n, ast.Call):
obj = n.func
if isinstance(obj, ast.Attribute):
name = obj.attr
elif isinstance(obj, ast.Subscript):
name = obj.value.id
elif isinstance(obj, ast.Name):
name = obj.id
else:
msg = f"The type {type(obj)} is not supported"
raise ValueError(msg)
all_functions.append(name)
_get_callees(n, all_functions)
def get_all_callees(node):
""""
Forr a given node of type ast.FunctionDef, visit all of its child nodes,
and return a list of all callees
"""
all_functions = []
_get_callees(node, all_functions)
return all_functions
def get_functions_metadata(filepath):
"""
For a given filepath, return a dictionary with the following keys:
- func_names: list of function/method names defined in the file
- is_njit: list of boolean values indicating whether the function is decorated with `@njit`
- callees: list of sets of function names that are called by the function
"""
file_contents = ""
with open(filepath, encoding="utf8") as f:
file_contents = f.read()
module = ast.parse(file_contents)
function_definitions = []
# include functions
function_definitions.extend([
node for node in module.body if isinstance(node, ast.FunctionDef)
])
# include methods of classes
all_methods = []
class_definitions = [
node for node in module.body if isinstance(node, ast.ClassDef)
]
for cd in class_definitions:
methods = [node for node in cd.body if isinstance(node, ast.FunctionDef)]
all_methods.extend(methods)
function_definitions.extend(all_methods)
# Create list of function/method names
func_names = [None] * len(function_definitions)
for i, fd in enumerate(function_definitions):
func_names[i] = fd.name
# Create list of flags indicating whether the function is decorated with `@njit`
is_njit = [False] * len(function_definitions)
for i, fd in enumerate(function_definitions):
for decorator in fd.decorator_list:
if not isinstance(decorator, ast.Call):
continue
obj = decorator.func
if isinstance(obj, ast.Attribute):
name = obj.attr
elif isinstance(obj, ast.Subscript):
name = obj.value.id
elif isinstance(obj, ast.Name):
name = obj.id
else:
msg = f"The type {type(obj)} is not supported."
raise ValueError(msg)
if name == "njit":
is_njit[i] = True
break
# Create list of sets of callees
callees = [None] * len(function_definitions)
for i, fd in enumerate(function_definitions):
# walk all child nodes of `fd`
callees[i] = set(get_all_callees(fd))
out = {
'func_names' : func_names,
'is_njit' : is_njit,
'callees' : callees
}
return out
ignore = ["__init__.py", "__pycache__"]
stumpy_path = pathlib.Path(__file__).parent / "stumpy"
filepaths = sorted(f for f in pathlib.Path(stumpy_path).iterdir() if f.is_file())
out = {}
collect_defined_funcs = []
for filepath in filepaths:
name = filepath.name
if name not in ignore and str(filepath).endswith(".py"):
functions_data = get_functions_metadata(filepath)
out[name] = functions_data
n = name.replace('.py', '')
collect_defined_funcs.extend([f"{n}.{f}" for f in functions_data['func_names']])
collect_defined_funcs_nameonly = [item.split('.')[1] for item in collect_defined_funcs]
d = {}
for fname, func_data in out.items():
file_name = fname.replace('.py', '')
d[file_name] = {}
d[file_name]['func_names'] = func_data['func_names']
d[file_name]['is_njit'] = func_data['is_njit']
d[file_name]['callees'] = []
for i, f in enumerate(func_data['callees']):
s = set()
for x in f:
if x not in collect_defined_funcs_nameonly:
continue
if x in d[file_name]['func_names']:
s.update([f"{file_name}.{x}"])
else:
idx = collect_defined_funcs_nameonly.index(x)
s.update([collect_defined_funcs[idx]])
d[file_name]['callees'].append(s)
# print content of the final output `d`
for fname, func_data in d.items():
print('file_name: ', fname)
for k, v in func_data.items():
print(f'{k}: {v}')
print('--------------------------------')
Next step is to revise the script above to check for fastmath
flag as well when is_njit
is True, and include that in the output.
Next step is to revise the script above to check for fastmath flag as well when is_njit is True, and include that in the output.
@seanlaw Since we are in PR, I was wondering if I should push my new script to this PR chunk by chunk so that you can review it more easily.
Since we are in PR, I was wondering if I should push my new script to this PR chunk by chunk so that you can review it more easily.
Sure. If there is sufficient overlap with docstring.py
then we should also consolidate and refactor if it makes sense.
@seanlaw
If there is sufficient overlap with docstring.py then we should also consolidate and refactor if it makes sense It feels like we are starting to accumulate scripts and so maybe there is a refactoring opportunity to do a utils.py. Just thinking out loud here.
So, are you thinking of having common functions in utils.py
and use those in docstring.py
and <new-script>.py
after refactoring? One way is to start with the new script and refactor later. Or, just start with adding some general functions in utils.py
. I went with the latter approach. Not sure which approach is more reasonable though.
Let's check the output of check_functions
in utils.py
Example 1:
func_names, is_njit, fastmath_values = check_functions("stumpy/stump.py")
# values of the returned lists
# Function names: ['_compute_diagonal', '_stump', 'stump']
# Is njit: [True, True, False]
# Fastmath values: [True, True, None]
Example 2:
func_names, is_njit, fastmath_values = check_functions("stumpy/core.py")
# values of the returned lists at index 25
# Function names: _welford_nanvar
# Is njit: True
# Fastmath values: {'contract', 'reassoc', 'nsz', 'arcp', 'afn'}
@NimaSarajpoor Things look fine. However, for the two examples that you provided, I don't like that the outputs are allowed to be of different type. Specifically, I expect Example 2 to return lists as well.
@NimaSarajpoor Things look fine. However, for the two examples that you provided, I don't like that the outputs are allowed to be of different type. Specifically, I expect Example 2 to return lists as well.
@seanlaw
Thanks to you, I am getting closer to your mindset :) The outputs in the second example are lists but I just showed the value for index 25. Otherwise, I had to show three lists, each with 95 elements that correspond to the 95 functions in core.py
.
I think I should have written something like # func_names[25]: _welford_nanvar
to avoid confusion.
See #708. This PR should address #708 (and #1011)
Each main checkbox below has at least one indented checkbox. The main checkbox represents a callee function with
fastmath=True
and an indented checkbox represents a caller function. If the caller function itself hasfastmath=True
flag as well, a star(*) is written next to its name. In such case, the callers of this function are represented in another checkbox set.[ ] maap._compute_multi_p_norm
[ ] core._sliding_dot_product
[ ] core._p_norm_distance_profile
[ ] core._calculate_squared_distance_profile
[ ] core.calculate_distance_profile
[ ] core._mass
[ ] core._apply_exclusion_zone
[ ] stump._stump
[ ] stump._compute_diagonal
[ ] * scraamp._compute_PI
[ ] scraamp._prescraamp
[ ] mstump._compute_multi_D
[ ] scrump._compute_PI
[ ] scrump. _prescrump
[ ] aamp._compute_diagonal
[ ] aamp._aamp
NOTE: The functions
core._count_diagonal_ndist
andcore._total_diagonal_ndists
are ignored as their input parameters cannot have non-finite value.