The query to get the queue for u/transcribot is currently very inefficient and causes big latency spikes:
The query is defined here. It currently compiles to the following:
SELECT
"api_submission"."id",
"api_submission"."original_id",
"api_submission"."create_time",
"api_submission"."last_update_time",
"api_submission"."redis_id",
"api_submission"."claimed_by_id",
"api_submission"."completed_by_id",
"api_submission"."claim_time",
"api_submission"."complete_time",
"api_submission"."source_id",
"api_submission"."url",
"api_submission"."tor_url",
"api_submission"."archived",
"api_submission"."content_url",
"api_submission"."removed_from_queue",
"api_submission"."cannot_ocr"
FROM
"api_submission"
LEFT OUTER JOIN "api_transcription" ON (
"api_submission"."id" = "api_transcription"."submission_id"
)
WHERE
(
NOT "api_submission"."cannot_ocr"
AND "api_submission"."id" IN (
SELECT
U0."id"
FROM
"api_submission" U0
INNER JOIN "api_transcription" U1 ON (U0."id" = U1."submission_id")
WHERE
U1."author_id" = % s
)
AND NOT "api_submission"."removed_from_queue"
AND "api_submission"."source_id" = % s
AND "api_transcription"."original_id" IS NULL
)
LIMIT
10
This could be simplified to something like this:
SELECT
"api_submission"."id",
"api_submission"."original_id",
"api_submission"."create_time",
"api_submission"."last_update_time",
"api_submission"."redis_id",
"api_submission"."claimed_by_id",
"api_submission"."completed_by_id",
"api_submission"."claim_time",
"api_submission"."complete_time",
"api_submission"."source_id",
"api_submission"."url",
"api_submission"."tor_url",
"api_submission"."archived",
"api_submission"."content_url",
"api_submission"."removed_from_queue",
"api_submission"."cannot_ocr"
FROM
"api_submission"
INNER JOIN "api_transcription" ON (
"api_submission"."id" = "api_transcription"."submission_id"
)
WHERE
(
NOT "api_submission"."cannot_ocr"
AND "api_transcription"."author_id" = % s
AND NOT "api_submission"."removed_from_queue"
AND "api_submission"."source_id" = % s
AND "api_transcription"."original_id" IS NULL
)
LIMIT
10
The query to get the queue for u/transcribot is currently very inefficient and causes big latency spikes:
The query is defined here. It currently compiles to the following:
This could be simplified to something like this:
I think we can just replace
by
if I understand the syntax correctly.