covidatlas / li

Next-generation serverless crawler for COVID-19 data
Apache License 2.0
57 stars 33 forks source link

Need to get second page of locations for reports #321

Closed jzohrab closed 4 years ago

jzohrab commented 4 years ago

In #316 , @zbraniecki noted that we're missing Poland. We're currently only processing the first ~2900 locations, which is the first page of records, but there's a second page of 299 items, which includes "pl". Get all records, please!

Sample bad code for getting all locations:

/**
 * Returns an array of locations currently in the system
 */
async function getAllLocations () {

  const messages = []
  async function get (data, lastEvaluatedKey = null, items = []) {
    let params = {}
    if (lastEvaluatedKey)
      params.ExclusiveStartKey = lastEvaluatedKey

    messages.push(`Called w/ params: ${JSON.stringify(params, null, 2)}`)

    let { Items, LastEvaluatedKey } = await data.locations.scan(params)
    messages.push(`Got ${Items.length} items`)
    if (lastEvaluatedKey !== null)
      messages.push(Items.map(i => i.slug).sort().join())

    items = items.concat(Items)

    let result = items
    if (LastEvaluatedKey) {
      result = await get(data, LastEvaluatedKey, items)
    }

    return result
  }

  const data = await arc.tables()
  const locs = await get(data)

  // const locations = result.Items.map(i => i.slug)
  return {
    json: { messages, count: locs.length },
    headers: {
      'cache-control': 'no-cache, no-store, must-revalidate, max-age=0, s-maxage=0'
    }
  }
}
jzohrab commented 4 years ago

Merged to master, will check staging reports in a few hours and then promote to prod.

jzohrab commented 4 years ago

Launched to staging, and iso1:pl shows up correctly. Record counts for the returned locations from the paginated query match the counts in the dynamoDB (3202 locations). All good, will close this and the other issue when prod is regenerated as well.

jzohrab commented 4 years ago

Fixed in prod.