drivly / ctx.do

Context Enrichment API
https://ctx.do
MIT License
6 stars 1 forks source link

Isolate and remove performance issues (should be no blocking async/await in ctx.do) #32

Open nathanclevenger opened 1 year ago

nathanclevenger commented 1 year ago

I noticed many of our services had slowed WAY down vs normal:

{
  "target": "https://api.cf",
  "first": 219,
  "min": 153,
  "p25": 192,
  "med": 205,
  "avg": 205,
  "p75": 219,
  "max": 231
}

My hypothesis was that something in ctx.do was slowing it down, so I made a test in api.cf:

router.all('*', async (req, env) => {
  const { searchParams } = new URL(req.url)
  if (!searchParams.has('test')) {
    const { user } = await env.CTX.fetch(req).then(res => res.json())
    req.user = user
  }
})

and that increased performance by 2000% ... so we need to isolate and resolve whatever is introducing slowness. We should make sure we aren't doing anything async/await to block the thread ... ctx.do should be instant ... my initial suspicion is that this is related to the Analytics Engine stuff we added on Friday.

{
  "target": "https://api.cf?test",
  "first": 13,
  "min": 6,
  "p25": 7,
  "med": 8,
  "avg": 8.4,
  "p75": 10,
  "max": 13
}
nathanclevenger commented 1 year ago

Yah ... I'm sure this code is a key contributor ... we shouldn't be doing anything like this in ctx.do because it adds 200ms+ of latency to every request, many of which are normally < 10ms ... if there is something like this, we could do a ctx.waitUntil to fetch it in the background, and leave it in memory so it could be retrieved and optimistically incremented on future requests ... but yah, we can't have any blocking requests like this in ctx.do code ...

      const whereClause = apikey ? `blob2='${apikey}'` : profile?.id ?
        `index1='${profile?.id}'` :
        `index1='' AND (blob3='${ip}'${cf?.botManagement?.ja3Hash ? ` OR blob7='${cf.botManagement.ja3Hash}'` : ''})`
      const [totalCount, monthlyCount, dailyCount] = await Promise.all([
        getStats(env, whereClause),
        getStats(env, whereClause + ` AND timestamp > TODATETIME('${now.toISOString().substring(0, 7)}-01 06:00:00')`),
        getStats(env, whereClause + `  AND timestamp > TODATETIME('${now.toISOString().substring(0, 10)} 06:00:00')`)
      ])
async function getStats(env, whereClause) {
  const res = await fetch("https://api.cloudflare.com/client/v4/accounts/b6641681fe423910342b9ffa1364c76d/analytics_engine/sql", {
    method: 'POST',
    headers: { Authorization: `Bearer ${env.ANALYTICS_API_KEY}` },
    body: `SELECT COUNT() AS count FROM INTERACTIONS${whereClause ? `
WHERE ${whereClause}` : ''}`
  })
  const json = res.ok && await res.json()
  const count = json.data?.[0]?.count
  return count ? parseInt(count) : undefined
}
nathanclevenger commented 1 year ago

I temporarily commented out the above code and the stats object from the return payload, because the latency introduced makes many of the chained services unusable

nathanclevenger commented 1 year ago

Yes, I just re-ran the perf tests and they dramatically improved:

{
  "target": "https://api.cf",
  "first": 11,
  "min": 8,
  "p25": 9,
  "med": 11,
  "avg": 11.55,
  "p75": 14,
  "max": 16
}

and

{
  "target": "https://api.cf?test",
  "first": 8,
  "min": 5,
  "p25": 6,
  "med": 7,
  "avg": 7.4,
  "p75": 8,
  "max": 12
}

from:

{
  "target": "https://api.cf/",
  "first": 219,
  "min": 153,
  "p25": 192,
  "med": 205,
  "avg": 205,
  "p75": 219,
  "max": 231
}

This means that ctx.do is adding 3-4ms of latency to the request ... which isn't horrible, but we should see if we can get it > 1ms or at least 1-2ms if at all possible ...