cdisc-org / cdisc-rules-engine

Open source offering of the cdisc rules engine
MIT License
45 stars 12 forks source link

Error when using equal_to with all or any to validate an empty dataset #739

Open ASL-rmarshall opened 1 week ago

ASL-rmarshall commented 1 week ago

If the equal_to operator is used within either all or any and the rule is executed for an empty dataset (either initially empty or empty as the result of pre-processing an inner join specified in Match Datasets), one of the following errors will be reported:

The error occurs because the data type of an empty pd.Series defaults to float64, and this cannot be combined with the prespecified boolean value for the combined result of the all (business_rules.engine.py line 42) or any (business_rules.engine.py line 53).

A possible fix might be to explicitly cast the result of the equal_to operator as bool, e.g. by changing line 162 of dataframe_operators.py:

        return self.value.apply(
            lambda row: self._check_equality(row, target, comparator, value_is_literal),
            axis=1,
        ).astype(bool)
         ^^^^^^^^^^^^^

Note that the same issue is likely to affect may other operators when used in combination with all/any and validating empty datasets, but a similar fix could be applied to those.

ASL-rmarshall commented 1 week ago

I haven't (yet) reported this as blocking a rule because I have used a workaround to prevent pre-processing creating empty datasets - in DDF00037, I have used a left join (Join Type: left) in Match Datasets instead of an inner join that would be more logical for the rule.