cid-harvard / atlas-economic-complexity

[DEPRECATED] The Atlas online is a powerful interactive tool that enables users to visualize a country’s total trade, track how these dynamics change over time and explore growth opportunities for more than a hundred countries worldwide.
http://atlas.cid.harvard.edu
81 stars 40 forks source link

Space savings on API calls - don't return product names etc #332

Closed makmanalp closed 5 years ago

makmanalp commented 9 years ago

Looking into the API calls today, I noticed that there's some attr / attr_data stuff already merged in (not in the attr / attr_data key but in the data itself) that could probably be removed and separated:


{
  "abbrv": "5113",
  "distance": 0.731524,
  "opp_gain": 0.138264,
  "name": "Woven fabrics of coarse animal hair or of horsehair",
  "color": "#2ca02c",
  "pci": 1.10276,
  "share": 6.5741274691263425e-06,
  "community_id": 416,
  "value": 686291.2,
  "rca": 0.5424805,
  "community_name": "Textiles",
  "code": "5113",
  "year": 1996,
  "item_id": 595,
  "id": "5113"
}

A quick test:

cat data.json | jq '.[]|{code:.code, year:.year, distance:.distance, opp_gain:.opp_gain, pci:.pci, share:.share, community_id:.community_id, rca:.rca}' > out.json

Gives a pretty huge space savings to remove the names, colors and the duplicated product ID, etc:

-rw-r--r--    1 makmana  HKS\Domain Users    10M Dec 17 14:31 data.json
-rw-r--r--    1 makmana  HKS\Domain Users   3.8M Dec 17 14:50 out.json

and gzipped sizes:

-rw-r--r--    1 makmana  HKS\Domain Users   2.0M Dec 17 14:31 data.json.gz
-rw-r--r--    1 makmana  HKS\Domain Users   731K Dec 17 14:50 out.json.gz