Closed joehiggi1758 closed 4 months ago
I'm thinking that we can split out the generic regex part into its own function ():
def match_str_version(line, pkg_name):
matches = re.search(
rf"""
{pkg_name} # Package name
[=><:"\'\[\]]+ # Zero or more special characters
(\d+\.\d+[\.0-9]*) # Capture "version" in `matches`
""",
line,
re.VERBOSE, # Ignores all whitespace in pattern
)
return matches
add then we update our existing function:
def find_pkg_mismatches(pkg_name, pkg_version, fnames):
"""
Determine if any package version has mismatches
"""
pkg_mismatches = []
for fname in fnames:
with open(fname, "r") as file:
for line_num, line in enumerate(file, start=1):
l = line.strip().replace(" ", "").lower()
matches = match_str_version(l, pkg_name)
if matches is not None:
version = matches.groups()[0]
if version != pkg_version:
pkg_mismatches.append((pkg_name, version, fname, line_num))
return pkg_mismatches
Finally, we can add a new test function that looks like:
def test_pkg_mismatch_regex():
pkgs = {
"numpy": "0.0",
"scipy": "0.0",
"numba": "0.0",
"python": "2.7",
"python-version": "2.7",
}
lines = [
'Programming Language :: Python :: 3.8',
'STUMPY supports Python 3.8',
"python-version: ['3.8']",
'requires-python = ">=3.8"',
'numba>=0.55.2',
]
for pkg_name, pkg_version in pkgs.items():
for line in lines:
matches = match_str_version(line, pkg_name)
if matches is None:
# Raise an error
And so this test would/should be executed BEFORE calling find_pkg_mismatches
.
Sounds great! Please excuse my naïveté, but looking to learn! Why use a validation function as opposed to a unit test here? (Aka where does that line come in)
@joehiggi1758 There no hard rule (i.e., it's more by feeling) but my rationale for this case is that this min.py
script is a utility file that really has nothing to do with matrix profiles (i.e., if the "mismatch" test is in the tests/
directory and it fails then it will make it seem like STUMPY is failing, which isn't ever the message that I want to send). Thus, I am trying to limit our unit test suite to be for matrix profiles only. In reality, I should/could split out all of these utility scripts into a separate package but, alas, I'm too lazy and here we are. I hope that makes sense? At the same time, this is a perfectly good opportunity to argue/debate the merits of each option (there may be additional options that I have no considered too).
In some situations, I can/will be stubborn but, in this case, I'm not sure what the best thing to do is. To some extent, I'm just taking the easy way out :)
We need a way to quickly check whether or not our regex has stopped covering one of our identified/existing cases in
min.py
find_pkg_mismatches()
. This may not need to be a unit test, but we should implement some protection.The list of all edge case formats that we've identified so far...