georgetown-cset / unicorn-topics

0 stars 0 forks source link

Need comments #1

Closed Rahkovsky closed 3 years ago

Rahkovsky commented 4 years ago

https://github.com/georgetown-cset/unicorn-topics/blob/7406cffa7bfe7a34f5636dd32618003d94cddbae/sql/creating_top_organizations.sql#L1

Several things are not clear:

  1. Why do we limit top 100?
  2. Microsoft has 3000 AI publications, why it is not picked up the query: SELECT org_name, ai_pubs, Grid_ID FROM (SELECT org_name, count(distinct ds_id) as ai_pubs, ARRAY_AGG(distinct Grid_ID) as Grid_ID FROMgcp-cset-projects.project_unicorn.grid_ai_pubs_052920group by org_name order by ai_pubs desc LIMIT 100)
rggelles commented 3 years ago
  1. Because we're only interested in top 100 for the report
  2. Because it has a bunch of GRIDs and they aren't all linked to its name; you have to combine them yourself.