Automattic / msm-sitemap

Comprehensive sitemaps for your WordPress VIP site. Joint collaboration between Metro.co.uk, WordPress VIP, Alley Interactive, Maker Media, 10up, and others.
73 stars 37 forks source link

Rethink how "msm_sitemap_update_last_run" works #77

Open tollmanz opened 8 years ago

tollmanz commented 8 years ago

I noticed something that is perhaps a non-issue, but want to draw attention in case it could become and issue. Along with my work in #76, I've been working on a way of initially deploying the plugin where I have more control over the sitemap generation. I got myself into a state where the cron events would no longer flag a site creation event. To reproduce:

  1. Run the plugin on a site that is not currently using the plugin (i.e., sitemaps should not already be generated)
  2. Disable cron generation using the methods in #76

    wp msm-sitemap cron --disable
  3. Start generating scripts manually, using something like (h/t @mjangda):

    for i in $(seq 1994 2015); do for j in $(seq 1 12); do wp msm-sitemap generate-sitemap-for-year-month --year=$i --month=$j; done; done;
  4. Kill the generation before it is done (ctrl + C)
  5. Reenable cron generation

    wp msm-sitemap cron --enable
  6. Trigger the main cron event

    wp cron event run msm_cron_update_sitemap
  7. Observe that no individual site map creation event have been registered:

    wp cron event list | grep msm

I find it interesting that when I try to generate the site maps manually, but don't finish, that the cron updater does not notice this. This is because msm_sitemap_update_last_run is set to a recent time and no new generation will happen.

I am only logging this issue because response has been pretty positive to #76. We need to make sure that if we implement a manual generation mode that we maybe rethink msm_sitemap_update_last_run to ensure that we don't get caught in this state where many sitemaps are missing, but there is no indication that this is the case. Yes, the develop is kind of taking things into her own hands, but we can do better to help the develop understand that things are left in an incomplete state.

pkevan commented 8 years ago

If we do switch to a cli driven initial generation as per #76 then it makes sense to indicate a level of progress perhaps per year completed in the dashboard.

It probably makes sense to keep the two process (update and initial generation) separate otherwise we could be in a situation where an initial generation fails and the update process then creates the large amount of cron events that we were trying to avoid.

@tollmanz is there a reason you use the script in 3 rather than wp msm-sitemap generate-sitemap

Was just thinking that this could be used to track the progress of initial generation even if it's only on each year completed

tollmanz commented 8 years ago

It probably makes sense to keep the two process (update and initial generation) separate otherwise we could be in a situation where an initial generation fails and the update process then creates the large amount of cron events that we were trying to avoid.

This makes sense. This is mostly a note to self that this will become and issue if we move forward with something like #76. We would need to make sure that the UX makes all of this clear.

@tollmanz is there a reason you use the script in 3 rather than wp msm-sitemap generate-sitemap

This was a @mjangda suggestion. It seems safe to complete smaller chunks of work at a time and if I need to stop the process, I can then manipulate the loop when starting again to ignore certain month/year combinations. Additionally, I was thinking I might put in some timeouts in the loop in order to reduce server load if needed.

pkevan commented 8 years ago

Probably a valid suggestion if @mjangda said so :smile: - I've never run it on large data set other than locally so can't really comment.

@mjangda do you think there is a way to achieve a complete set through CLI, maybe a stepped approach which prompts the user?