Add tags to duel - Githubissues

DenjellBoone commented 2 years ago

As discussed: Base functionality for using duel with tags. It is possible to add more than one tag. Duels that use tags to choose the problem are declared inofficial.

DenjellBoone commented 2 years ago

Thanks for comments and finding the bug. I made the required changes. Please check again.

DenjellBoone commented 2 years ago

"matching_tags is probably a better name" I thought about it for a while and think "matches_tags" creates more readable code. "if prob.matches_tags(tags): do_something" feels more natural than "if prob.matching_tags(tags): do_something"

The whole function is a bit flawed since it does more than one thing. It tries to match tags to the problem tags but only returns something if all tags could be matched and then it also returns a list of the full names of the tags (for some display function).

Maybe its a better idea to create a function that returns how many of the given tags could be matched and maybe also returns a list of their names. Then a second function (matches_all_tags) can be defined that just checks if prob.matches_tags returns the correct number. This would allow for a second function "matches_any_tag" which can be used to negate tags.

algmyr commented 2 years ago

"matching_tags is probably a better name" I thought about it for a while and think "matches_tags" creates more readable code. "if prob.matches_tags(tags): do_something" feels more natural than "if prob.matching_tags(tags): do_something"

matches_tags makes it sound like it returns a boolean, which it doesn't. That was an issue with the original name as well

DenjellBoone commented 2 years ago

Ok, i named it like suggested and added the changes for the other remarks too.

algmyr commented 2 years ago

I guess you could add a helper like you mentioned, e.g. has_matching_tags which just returns len(...) > 0 or similar. It would make the call sites that only cares about the bool value nicer. Or do you think there is some nicer refactoring?

DenjellBoone commented 2 years ago

i think my best idea would be: def count_matching_tags(tags): // try to match all given tags and return number of successful matches

def matches_all_tags(tags): return self.count_matching_tags(tags) == len(tags)

def matches_any_tag(tags): #optional for later use return self.count_matching_tags(tags) > 0

and maybe another function to generate the list of tag strings. In ;gimme the list of matches is not saved from the first call anyways. So I think a different function can do the job as well and we maybe save some time for not generating the matched tag strings for all problems that are in the pool during problem selection.

But idk if 3 functions for doing the job are cluttering the whole thing too much. I think i will give it a try and we can decide which commit to use.

algmyr commented 2 years ago

What about matching_tags rather than the count one? The other functions matches_any/all_tags can be implemented in terms of that.

DenjellBoone commented 2 years ago

I am fine with the name. You can take a look now. After short testing it still looks functional.

algmyr commented 2 years ago

I don't really like the duplicated logic, and afaict the unfiltered behavior is not there anymore? I think the most generic thing is to have something that returns a dict, that should be enough for any of the functions. Example:

from collections import defaultdict

class A:

    def _matching_tags_dict(self, match_tags):
        """Returns a dict with matching tags,
        if match_tags is falsy, do not filter."""
        # Empty string matches everything.
        match_tags = match_tags or ['']
        tags = defaultdict(list)
        for match_tag in match_tags:
            for tag in self.tags:
                if match_tag in tag:
                    tags[match_tag].append(tag)
        return dict(tags)

    def matches_all_tags(self, match_tags):
        # >= to handle unfiltered case
        return len(self._matching_tags_dict(match_tags)) >= len(match_tags)

    def matches_any_tag(self, match_tags):
        return len(self._matching_tags_dict(match_tags)) > 0

    def get_matched_tags(self, match_tags):
        return [
            tag for tags in self._matching_tags_dict(match_tags).values()
            for tag in tags
        ]

a = A()
a.tags = ['a', 'abc', 'bc']

assert a.matches_all_tags([])
assert a.matches_all_tags(['a', 'abc'])
assert a.matches_all_tags(['a', 'abc'])
assert not a.matches_all_tags(['abc', 'z'])
assert a.matches_all_tags(['a', 'abc', 'bc'])
assert a.matches_any_tag([])
assert a.matches_any_tag(['abc', 'z'])
assert not a.matches_any_tag(['z'])
assert sorted(a.get_matched_tags([])) == ['a', 'abc', 'bc']
assert sorted(a.get_matched_tags(['a'])) == ['a', 'abc']
assert sorted(a.get_matched_tags(['bc'])) == ['abc', 'bc']
assert sorted(a.get_matched_tags(['abc'])) == ['abc']

DenjellBoone commented 2 years ago

I understand that the duplicated logic is annoying. What i like about this solution is that it does not need the extra case of "empty list matches all tags" to work. If u call it with empty list u get 0 matches which is interpreted as True in matches_all_tags. I also would consider matches_any_tag returning false if an empty list of tags is given as correct behaviour (from the perspective how i would call it from the code that uses the function). It allows for problem filtering like this:

        problems = [prob for prob in cf_common.cache2.problem_cache.problems
                    if prob.matches_all_tags(tags_wished_by_user) 
                    and not prob.matches_any_tag(tags_user_wants_to_avoid)]

If both lists are empty it will mark any problem as valid. With the implementation with treating [] as matcher for everything u need to have a different user code.

Also the dict approach fails if the same tag is added twice: assert a.matches_all_tags(['a', 'a']) The text file contains my test code: classA.txt

algmyr commented 2 years ago

Duplicates are easy enough to deal with anyway, and it's probably a good idea to dedupe the argument anyway. Dropping the empty list special case after that makes the approaches the same wrt the output, but without the duplication.

class C:

    def _matching_tags_dict(self, match_tags):
        """Returns a dict with matching tags."""
        tags = defaultdict(list)
        for match_tag in match_tags:
            for tag in self.tags:
                if match_tag in tag:
                    tags[match_tag].append(tag)
        return dict(tags)

    def matches_all_tags(self, match_tags):
        # >= to handle unfiltered case
        match_tags = set(match_tags)
        return len(self._matching_tags_dict(match_tags)) == len(match_tags)

    def matches_any_tag(self, match_tags):
        match_tags = set(match_tags)
        return len(self._matching_tags_dict(match_tags)) > 0

    def get_matched_tags(self, match_tags):
        return [
            tag for tags in self._matching_tags_dict(match_tags).values()
            for tag in tags
        ]

a = C()
a.tags = ['a', 'abc', 'bc']

assert a.matches_all_tags([])
assert a.matches_all_tags(['a', 'abc'])
assert a.matches_all_tags(['a', 'abc'])

assert not a.matches_all_tags(['abc', 'z'])
assert a.matches_all_tags(['a', 'a'])
assert a.matches_all_tags(['a', 'abc', 'bc'])
assert not a.matches_any_tag([]) #different convention
assert not a.matches_any_tag('d')
assert a.matches_any_tag(['abc', 'z'])
assert not a.matches_any_tag(['z'])
assert sorted(a.get_matched_tags([])) == [] #different convention
assert sorted(a.get_matched_tags(['a'])) == ['a', 'abc']
assert sorted(a.get_matched_tags(['bc'])) == ['abc', 'bc']
assert sorted(a.get_matched_tags(['abc'])) == ['abc']

I agree those semantics are more sensible overall, the one concern is that one of the main usages of this code is commands like ;gimme which for sure needs some "no tags, no filtering" behavior.

cheran-senthil / TLE

Add tags to duel #497