BiologicalRecordsCentre / iRecord

Repository to store and track enhancements, issues and tasks regarding the iRecord website.
http://irecord.org.uk
2 stars 1 forks source link

Check behaviour of record cleaner rules for dates #627

Closed kitenetter closed 4 years ago

kitenetter commented 5 years ago

A user (who happens to be recording spiders!) entered some records giving a full date range of 01/01/2018 to 31/12/2018. The records were being entered on behalf of someone else, and that person then said he didn't want the records shown as "01/01/2018 to 31/12/2018", he wanted them shown as "2018".

So the user who had entered the records edited them to give just the year rather than the full range. He then received notifications that some of the records were out of the usual date range for the species concerned, but he doesn't think he had had such notifications when he used the full date range method of data entry, only when he changed it to the year.

As far as I'm aware the database stores the same entries into date_from and date_to regardless of whether the user enters "01/01/2018 to 31/12/2018" or "2018".

Questions arising:

Not a major priority, but it seems sensible to avoid triggering notifications where they are not relevant, and also it would be good to ensure that behaviour is consistent when entering the same date range in different ways.

burkmarr commented 5 years ago

@johnvanbreda - trying to reproduce this on localhost. I have relevant record cleaner modules enabled and have imported rules for spiders. I can view these rules in the warehouse, for example here's the period within year rule for Nurseryweb Spider:

Tvk=NBNSYS0000008830 StartDate=0213 EndDate=1231 DataFieldName=Stage

When I enter a record with a date outside this range for this species, e.g. 01/02/2019, and then run scheduled tasks, I can see that the data cleaner has run:

Running data_cleaner Last run at 2019-07-09 17:37:51 Data cleaner generated 0 messages.

But it doesn't flag the record. I notice that the table cache_verification_rules_period_within_year in my warehouse is empty and I suspect that there is some more configuration to do to enable the checks? I can't see anything in the manual.

What did I miss?

burkmarr commented 4 years ago

@johnvanbreda - I'm revisiting this issue after a gap during which I've rebuilt my localhost warehouse. So I've got to load the data cleaner rules again. I have all the data_clearner modules enabled and this page displays fine: http://localhost/warehouse/index.php/verification_rule.

From there I can manually add a verifcation rule with the 'new verification_rule' button.

But when I attempt to load spider rules, either from the NBN server or from a CSV I downloaded from the server, Kohana errors with this message:

11:16:07 error Uncaught Kohana_404_Exception: The page you requested, verification_rule/upload, could not be found. in file C:/xampp/htdocs/warehouse/system/core/Kohana.php on line 841

I don't understand why verification_rule/upload endpoint fails to resolve. Any ideas?

johnvanbreda commented 4 years ago

Interesting! We obviously haven't used this function for a long time. The verification_rule controller's upload function was private (as a result of linting 2 years ago which presumably suggested making a public function private without realising it was being dynamically called). I've committed a fix to develop.

burkmarr commented 4 years ago

I should have clocked that it was private. Rule import worked now.

burkmarr commented 4 years ago

(Don't know how I got it to work last time - I can only imagine I changed it from private to public and forgot about it without updating GitHub.)

burkmarr commented 4 years ago

@kitenetter - I've had a look into this and it's thrown up a surprise.

The phenology (period within year) rules do not treat these two vague dates differently - '01/01/2018 to 31/12/2018' and '2018'. The representation in the database is shown below.

image Indicia uses the date_type field to differentiate between these two. The top one (DD) represents the full date range (I guess DD stands for day-day range) and the botton one (Y) the year only. However the phenology rules do not consider the data_type value. So I can't explain why the user only noticed the rules being fired in the second case - perhaps they were overlooked.

More surprising to me is that the phenology rule code does not deal with record date ranges - it only considers the date_start field. So in the example cases above, both records would be considered as records for 1st January.

The relevant code is on lines 118-135 of the rule code if you want to look. (Ignore the vr.reverse_rule bit - that just deals with the facility to reverse the logic of the rule.) It's a bit complex because it deals with cases where the rule start_date > end_date (i.e. period spans the year end) and also specification of rules with null start or end dates, but the important point to note is that that the rule (vr) dates are only compared agains the occurrence (co) start_date.

If we decide that this needs to be addressed, there is a conversation to be had about how the phenolgy rules should deal with date ranges. From the first of your original questions, you implied that you would want the rule to fire (i.e. trigger a notification) if the any part of a date range fell outside of the date range allowed by the rule. But that would mean that entering a record date as a year would always trigger a phenology rule (assuming a match with stage). But aren't years normally used for dates when the true date is really unknown? Would we want the rule to fire in those circumstances?

That this has rarely (if ever) been remarked upon before suggests that it's not a big problem. I just did a count of 2019 records by date type and this is the result:

image

So its clearly not something that has the potential to affect many records.

kitenetter commented 4 years ago

@burkmarr I can see two alternative approaches:

  1. Records with long date ranges should not be considered by the date rulechecks, because if they were they would always trigger a warning that is probably meaningless (but would highlight that it is a record that cannot be judged on the basis of phenology)
  2. Records with long date ranges should be considered by the date rulechecks, because if they were not they would acquire the "passes all rule checks" flag and could thus be verified without sufficient scrutiny. A downside is that recorders are presumably sent an 'incorrect phenology' notification every time they enter a long-date-range record.

I can see pros and cons to both options, but I think my preference is for option 2 (contrary to what I said at the top of this thread), which I think is what currently happens if I've understood you correctly.

Given that this is not causing widespread concern, and that I think on balance it is working the way it should, I'm going to close this (even though we're still unclear as to the cause of the problem as originally raised).