WordPress / openverse

Openverse is a search engine for openly-licensed media. This monorepo includes all application code.
https://openverse.org
MIT License
240 stars 194 forks source link

Duration exceeds Postgres integer column maximum size #1361

Open krysal opened 1 year ago

krysal commented 1 year ago

Description

We got an exception from the wikimedia_reingestion_workflow DAG due to an extremely large value for the duration column of the audio table. This doesn't look like a valid value so we must investigate what happened here.

Exception: value "78630107343183267448505870018812960624938665116018082604580360689842960029334315372638861311998064302073362330576381905995321575214792453503270223818310281547107639141744048631140921565692244439924744392714942624323438445935444736418486393657265676881684504510464" is out of range for type integer
CONTEXT:  COPY provider_data_audio_wikimedia_reingestion_20221002t000000_1176, line 1, column duration: "7863010734318326744850587001881296062493866511601808260458036068984296002933431537263886131199806430..."
SQL statement "copy provider_data_audio_wikimedia_reingestion_20221002T000000_1176  from '/rdsdbdata/extensions/aws_s3/amazon-s3-fifo-1083-20221009T103341Z-0' with DELIMITER E'    '"

Exception Type: psycopg2.errors.NumericValueOutOfRange

Reproduction

  1. [WIP]
  2. See error.

Additional context

Logs in Airflow UI (staff only).

AetherUnbound commented 1 year ago

This is related/similar to WordPress/openverse#1583

AetherUnbound commented 1 year ago

Per the discussion in the post above, we have decided to use a stopgap solution in the form of WordPress/openverse#1358 and follow that up with a series of database migrations down the line.