NaturalHistoryMuseum / scratchpads2

Scratchpads 2.0
http://scratchpads.org
GNU General Public License v2.0
199 stars 83 forks source link

ecoint/scratchpad_statistics module takes too much CPU on large sites #5777

Open PaulKiddle opened 5 years ago

PaulKiddle commented 5 years ago

For example on hosts.myspecies.info

More info from @edwbaker:

https://github.com/NaturalHistoryMuseum/scratchpads2/blob/master/sites/all/modules/custom/ecoint/ecoint.cron.inc The resource issue is that this goes through the Drupal Views API, which adds an awful lot of overhead It would be better in this case to just execute a single MySQL query for the whole thing instead You might get the same issue if the older DwC-A module is run on a big site too

The easiest fix, given that hosts is not really updated often would be to only run that action iof the data had been updated I see that has also been removed from the dwca_export module If you look at https://github.com/NaturalHistoryMuseum/scratchpads2/commit/c9e5a05dbb2a6ec08b8758c23440f99f0d50ed83#diff-fbbdadeb9949c581dc61cb4aaa9f48c5 at the start there are a couple of variables dwca_export_rebuild was set to T when any content was added, edited or deleted. The archive code only ran if there had been a change.

NB: I've temporarily disabled this function until this issue is solved

big sites also cause a similar problem for scratchpads_statistics -this has been disabled on the hosts scratchpad too through this PR and a variable set on the hosts.myspecies.info scratchpad

The sql generated by scratchpads_statistics that ends up making the server hang:

SELECT node.*, node.nid AS entity_id, node.vid AS revision_id, node.type AS bundle, f.field_taxonomic_name_tid AS field_taxonomic_name_tid, sst.id AS ss_term_id, ssu.id AS ss_user_id, nc.totalcount AS totalcount, nr.uid AS revision_uid, ssu_r.id AS ss_revision_user_id, 'node' AS entity_type
FROM 
node node
LEFT OUTER JOIN field_data_field_taxonomic_name f ON f.entity_id = node.nid
LEFT OUTER JOIN taxonomy_term_data t ON t.tid = f.field_taxonomic_name_tid
LEFT OUTER JOIN scratchpads_statistics_term sst ON sst.term LIKE t.name
INNER JOIN users u ON u.uid = node.uid
INNER JOIN scratchpads_statistics_user ssu ON ssu.email = u.mail
LEFT OUTER JOIN node_counter nc ON nc.nid = node.nid
INNER JOIN node_revision nr ON nr.nid = node.nid
INNER JOIN users ru ON ru.uid = nr.uid
INNER JOIN scratchpads_statistics_user ssu_r ON ssu_r.email = ru.mail
WHERE  (node.type = 'ecological_interactions') 
ORDER BY nid ASC 
edwbaker commented 5 years ago

Also, check if the ecoint view uses batch processing when executing, this might decrease the load at the expense of it taking longer.