cid-harvard / atlas-economic-complexity

[DEPRECATED] The Atlas online is a powerful interactive tool that enables users to visualize a country’s total trade, track how these dynamics change over time and explore growth opportunities for more than a hundred countries worldwide.
http://atlas.cid.harvard.edu
81 stars 40 forks source link

Sum(export_value) needs to go! #178

Open makmanalp opened 10 years ago

makmanalp commented 10 years ago

pt-query-digest --since 2014-05-15 /var/lib/mysql/slow-log reports:

# Query 1: 0.00 QPS, 0.01x concurrency, ID 0x2390345FC99535A7 at byte 47119879
# This item is included in the report because it matches --limit.
# Scores: Apdex = 0.11 [1.0], V/M = 14.37
# Query_time sparkline: |      _^|
# Time range: 2014-05-15 01:31:18 to 2014-06-10 16:11:23
# Attribute    pct   total     min     max     avg     95%  stddev  median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count         26    1571
# Exec time     29  22213s      2s    157s     14s     23s     14s     14s
# Lock time     19   478ms    70us    42ms   304us   316us     1ms   260us
# Rows sent      2 276.79k      30     217  180.42  212.52   29.41  183.58
# Rows examine   8 201.27M   4.68k 349.93k 131.19k 312.96k  86.40k 101.89k
# Query size    29 580.72k     377     379  378.52  363.48       0  363.48
# String:
# Databases    atlas_db
# Hosts        localhost
# Users        root
# Query_time distribution
#   1us
#  10us
# 100us
#   1ms
#  10ms
# 100ms
#    1s  ######################
#  10s+  ################################################################
# Tables
#    SHOW TABLE STATUS FROM `atlas_db` LIKE 'new_observatory_hs4_ccpy'\G
#    SHOW CREATE TABLE `atlas_db`.`new_observatory_hs4_ccpy`\G
#    SHOW TABLE STATUS FROM `atlas_db` LIKE 'new_observatory_country'\G
#    SHOW CREATE TABLE `atlas_db`.`new_observatory_country`\G
EXPLAIN /*!50100 PARTITIONS*/
SELECT T3.`name_3char`, T3.`name_en`, SUM(`new_observatory_hs4_ccpy`.`export_value`) AS `value` FROM `new_observatory_hs4_ccpy` INNER JOIN `new_observatory_country` T3 ON (`new_observatory_hs4_ccpy`.`destination_id` = T3.`id`) WHERE (`new_observatory_hs4_ccpy`.`origin_id` = 64  AND `new_observatory_hs4_ccpy`.`year` = 2010 ) GROUP BY T3.`name_3char`, T3.`name_en` ORDER BY NULL\G

This is ridiculous. Median exec time of 14s?!

Culprit seems to be: https://github.com/cid-harvard/atlas-economic-complexity/blob/d86b950564e3b9240fbf2853ae2a88651d54dd25/django_files/observatory/models.py#L458

Explain further says:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: new_observatory_hs4_ccpy
   partitions: NULL
         type: index_merge
possible_keys: origin_id,year
          key: origin_id,year
      key_len: 4,2
          ref: NULL
         rows: 173551
        Extra: Using intersect(origin_id,year); Using where; Using temporary
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: T3
   partitions: NULL
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: atlas_db.new_observatory_hs4_ccpy.destination_id
         rows: 1
        Extra:
2 rows in set (0.00 sec)

The using intersect() and temporary is very ominous indeed. Either preaggregate this or maybe create a compound index.

makmanalp commented 10 years ago

Meanwhile in api_csay:


  """Define parameters for query"""
  year_where = "AND year = %s" % (year,) if crawler == "" else " "
  rca_col = "null"
  if trade_flow == "net_export":
    val_col = "SUM(export_value - import_value) as val"
  elif trade_flow == "net_import":
    val_col = "SUM(import_value - export_value) as val"
  elif trade_flow == "export":
    val_col = "SUM(export_value) as val"
  else:
    val_col = "SUM(import_value) as val"
makmanalp commented 10 years ago

def get_countries(input, trade_flow): in views_overview.py

makmanalp commented 10 years ago

Looks like these are from the rankings pages.