data-to-insight / csc-validator-be-cin

1 stars 1 forks source link

8816 ValueError, attempt to get argmax of an empty sequence #294

Closed WillLP-code closed 1 year ago

WillLP-code commented 1 year ago

It looks like it's something to do with the refactor that allowed comparisons within children.

WillLP-code commented 1 year ago

Looking at this rule, it will not pass the Pytest when the test includes a child with multiple empty CINreferralDates and no filled CINreferralDates, which is what is causing the issue with the synthetic data. I've recreated this by adding two extra rows to the sample df:

           {
                "LAchildID": "child4",
                "CINclosureDate": pd.NA,
                "CINreferralDate": pd.NA,  
                "CINdetailsID": "cinID3",
                "ReferralNFA": "0",
            },
            {
                "LAchildID": "child4",
                "CINclosureDate": pd.NA,
                "CINreferralDate": pd.NA,  
                "CINdetailsID": "cinID4",
                "ReferralNFA": "0",
            },

The issue is caused by trying to find the max value of an empty list, which happens in the synthetic data and could happen in real data being validated. Options include filling empty rows with a stand in value, as is the case in some other rule types that check across groups, and have the error id link those cells to the FE, or figure out a different way of finding the most recent values.

tab1tha commented 1 year ago

This is so helpful Will ! Now that you say that, it can be fixed by dropna. We can safely drop nans in CINreferralDate without affecting the logic.