BrickSchema / py-brickschema

Python package for working with Brick (brickschema.org/)
Other
55 stars 15 forks source link

Tagset Inference misses some Classes #72

Open david-waterworth opened 3 years ago

david-waterworth commented 3 years ago

There's an issue with the order in which inference first computes the most likely target, then filters to remove points or equips which can result in missing classes. I noticed this when I tried to adapt the HaystackInferenceSession for my own use.

In particular, I believe Chilled Water Valve Command should map to ['Chilled_Water_Valve', 'Valve_Command'] since there's equipment and a point implied by the tags.

The issue is in infer_entity the method most_likely_tagsets is called twice, first with the set of tags minus {"point","sensor","command","setpoint","alarm","status","parameter","limit"} plus {"equip"} with the result filtered to retain only equipment, and then again with the original set of tags, with the result filtered to only retain points.

The problem is there's filtering inside most_likely_tagsets which results in it returning Chilled Water Valve in both cases, so the point is never inferred.

The easiest fix is to lift the filtering by class type from infer_entity into most_likely_tagsets so it only considers candidates of the correct type.

The code below seems to work, I added is_valid_class as an argument and use this to pre-filter. It may be better to simply pass a string representation, plus I couldn't be bothered thinking about how to make it a list comprehension.

    def most_likely_tagsets(self, orig_s, num=-1, is_valid_class=lambda x:True):
        """
        Returns the list of likely classes for a given set of tags,
        as well as the list of tags that were 'leftover', i.e. not
        used in the inference of a class

        Args:
            tagset (list of str): a list of tags
            num (int): number of likely tagsets to be returned; -1 returns all

        Returns:
            results (tuple): a 2-element tuple containing (1)
            most_likely_classes (list of str): list of Brick classes
            and (2) leftover (set of str): list of tags not used

        """
        s = set(map(_to_tag_case, orig_s))
        tagsets_ = self.lookup_tagset(s)

        # filter classes which aren't of required type
        tagsets = []
        for classes,tagset in tagsets_:
            classes = [c for c in classes if is_valid_class(c)]
            if len(classes) > 0:
                tagsets.append((classes,tagset))

        if len(tagsets) == 0:
            # no tags
            return [], orig_s
        # find the highest number of tags that overlap
        most_overlap = max(map(lambda x: len(s.intersection(x[1])), tagsets))

        # return the class with the fewest tags >= the overlap size
        candidates = list(
            filter(lambda x: len(s.intersection(x[1])) == most_overlap, tagsets)
        )

        # When calculating the minimum difference, we calculate it form the
        # perspective of the candidate tagsets because they will have more tags
        # We want to find the tag set(s) who has the fewest tags over what was
        # provided
        min_difference = min(map(lambda x: len(x[1].difference(s)), candidates))
        most_likely = list(
            filter(lambda x: len(x[1].difference(s)) == min_difference, candidates)
        )

        leftover = s.difference(most_likely[0][1])
        most_likely_classes = list(set([list(x[0])[0] for x in most_likely]))
        # return most likely classes (list) and leftover tags
        # (what of 'orig_s' wasn't used)
        if num < 0:
            return most_likely_classes, leftover
        else:
            return most_likely_classes[:num], leftover
gtfierro commented 3 years ago

Thanks for the detailed bug report @david-waterworth ! I'll try to take a look soon