civicrm / org.civicrm.doctorwhen

Doctor When: Temporal cleanup agent
Other
2 stars 9 forks source link

Breaks CiviMail scheduling #14

Open clarkac1 opened 6 years ago

clarkac1 commented 6 years ago

I ran Doctorwhen and then scheduled mailings ceased to run. The jobs just sat there as 'Scheduled'. In addition the time shown on the 'Settings - Scheduled Jobs' screen was several hours wrong (4 hours back I think). The scheduled jobs listing was correct but the first screen was wrong. So I reverted the following dates back to 'DateTime': civicrm_job.last_run civicrm_mailing_job.scheduled_date
civicrm_mailing_job.start_date civicrm_mailing_job.end_date
... and then scheduled mailings started to work again (phew) and the 'Settings - Scheduled Jobs' screen is now correct. I hope that other dates that I didn't revert don't cause any similar damage. Do you advise I revert them? This installation runs at CiviHosting, is a UK install but CiviHosting's server is in Bulgaria which is 2hrs ahead of the UK. This time difference has never caused a problem before. I realise you have a warning about running this extension, but this problem is severe as this install uses scheduled mailings a lot. This is release 5.1.2 under Drupal 7 - downloaded this extension today.

khorporative commented 5 years ago

Does it break already existing schedules or also interfere with correctly creating new ones?

clarkac1 commented 5 years ago

It broke existing schedules

totten commented 5 years ago

Thanks for the report! doctorwhen has been sitting in "experimental" because of the nagging sense that it's hard to QA well.

Do you happen to recall... it broke the existing schedules, but would they run if you waited a few hours? There's a subtle difference between "mailings don't send" and "mailings send at the wrong effective time." The two problems may look the same for a while.

Generally, I expect new mailings (post-conversion) should work well in multi-TZ, but the symptoms for existing mailings could depend on a fairly complex mix of variables.

Trying to riff on this story, perhaps the situation for mail delivery could be described as three agents in different timezones:

We have basically three operative TZ's in play (Alice TZ, Bob TZ, cli.php TZ). They may or may not be aligned.

When Bob runs doctorwhen, it converts from DATETIME to TIMESTAMP, and this operation injects TZ info. (Ex: The old DATETIME value was 2019-01-02 3:04, but it was ambiguous about TZ. The conversion basically indicates "we meant 2019-01-02 3:04 US/Eastern or "we meant 2019-01-02 3:04 Europe/Central".) IIRC, the conversion uses Bob's configured TZ. (It's also plausible that it's coerced to UTC or some server-configured TZ.) In any event, there is an effective conversion timezone, which I'll continue calling Bob's TZ for brevity.

The overall experienced symptoms would depend on the mix of TZs. Here's what I expect for pre-existing scheduled records:

  1. If AliceTZ does not really exist (e.g. because "Alice" is really multiple people in different TZ's composing mail blasts with conflicted expectations about effective TZ), then the rest of these rules don't matter. The schedules are deeply conflicted. You probably need to re-assess them on a mailing-by-mailing basis.
  2. If all three TZs are aligned, then all existing schedules work, with or without doctorwhen's conversion.
  3. If all three TZs are misaligned, then existing schedules are boinked, with or without doctorwhen's conversion.
  4. If AliceTZ and BobTZ are aligned (but cli.php misaligned), then schedules were misbehaving before, but they should start to work as expected after conversion.
  5. If AliceTZ and cli.php TZ are aligned (but BobTZ misaligned), then existing schedules worked before, but they broke during conversion. Basically, doctorwhen converted to the wrong TZ.
  6. If BobTZ and cli.php TZ are aligned (but AliceTZ misaligned), then existing schedules were broken before. The conversion substantively preserves the bad timing, but it makes it more visible and fixable (i.e. Alice sees her mailing as rescheduled, but the new schedule is more accurate).

It sounds to me like you're probably in situation (5) or (6). But all of this is basically educated speculation. If I were working on a system with the problem, I'd inspect a bit (maybe extra debug/log code) to figure out the actual values of Alice TZ, Bob TZ, and cli.php TZ - and test/validate whichever theory seemed most relevant.

Zooming out a little, I think there are basically three approaches to resolve those cases:

  1. (No development needed) Pause cron jobs. Run doctorwhen. Manually cleanup existing schedules via SQL or API. Re-enable cron jobs.
  2. (Development needed) Make the effective conversion timezone explicit: Update doctorwhen UI to include a timezone field. Set the default to Bob TZ, but include help on how to pick. (If a singular AliceTZ exists, then that's the best one to choose. But if it doesn't, then cli.php TZ feels like better CYA -- because you're not actually making the data any worse; you're just making the old breakage more visible.)
  3. (Development needed) Update doctorwhen to include a specific reschedule UI (i.e. show a table of all scheduled mailings and prompt the user to edit/verify the schedule for each).

(Aside: it'd be pretty slick if doctorwhen could identify all these TZ's and give more pointed/automated options. But given the diverse environments and customizations which can affect TZ config, it's not worth the effort.)

clarkac1 commented 5 years ago

Thanks for your comprehensive piece on this! The client is slightly unusual in that the server is in Bulgaria which is 2 hours different from the UK; they have had Civi since 2013 so occasional 'legacy' problems; in the winter they use the system almost all the time (they run a nightshelter amongst other things). So really best not to dabble! I have another very different client also with many dates that need fixing, so will run Doctor When against them and let you know how it goes.

benrfairless commented 2 years ago

Does this remain an issue still? Sorry for resurrecting a very old issue.