WordPress / openverse

Openverse is a search engine for openly-licensed media. This monorepo includes all application code.
https://openverse.org
MIT License
254 stars 204 forks source link

`load_sample_data.sh` fails to load sample data #2362

Closed sarayourfriend closed 1 year ago

sarayourfriend commented 1 year ago

Description

Loading sample data fails with this error:

DELETE 0
ERROR:  extra data after last expected column
CONTEXT:  COPY image, line 2: "0e3315c5-3328-4a99-80ab-567ac32f685f,2022-12-21 17:29:54.000000+00,2022-12-21 17:29:54.000000+00,pro..."
REFRESH MATERIALIZED VIEW
DELETE 0
ERROR:  extra data after last expected column
CONTEXT:  COPY audio, line 2: "8624ba61-57f1-4f98-8a85-ece206c319cf,2022-12-06 06:54:25.000000+00,2022-12-06 06:54:25.000000+00,pro..."
REFRESH MATERIALIZED VIEW

This appears to be related to the standardized_popularity addition to the sample data but the upstream_db tables not having that column:

openledger> \d image;
+-------------------------+--------------------------+--------------------------------------+
| Column                  | Type                     | Modifiers                            |
|-------------------------+--------------------------+--------------------------------------|
| identifier              | uuid                     |  not null default uuid_generate_v4() |
| created_on              | timestamp with time zone |  not null                            |
| updated_on              | timestamp with time zone |  not null                            |
| ingestion_type          | character varying(80)    |                                      |
| provider                | character varying(80)    |                                      |
| source                  | character varying(80)    |                                      |
| foreign_identifier      | character varying(3000)  |                                      |
| foreign_landing_url     | character varying(1000)  |                                      |
| url                     | character varying(3000)  |  not null                            |
| thumbnail               | character varying(3000)  |                                      |
| width                   | integer                  |                                      |
| height                  | integer                  |                                      |
| filesize                | integer                  |                                      |
| license                 | character varying(50)    |  not null                            |
| license_version         | character varying(25)    |                                      |
| creator                 | character varying(2000)  |                                      |
| creator_url             | character varying(2000)  |                                      |
| title                   | character varying(5000)  |                                      |
| meta_data               | jsonb                    |                                      |
| tags                    | jsonb                    |                                      |
| watermarked             | boolean                  |                                      |
| last_synced_with_source | timestamp with time zone |                                      |
| removed_from_source     | boolean                  |  not null                            |
| filetype                | character varying(5)     |                                      |
| category                | character varying(80)    |                                      |
+-------------------------+--------------------------+--------------------------------------+
Indexes:
    "image_pkey" PRIMARY KEY, btree (identifier)
    "image_identifier_key" UNIQUE, btree (identifier)
    "image_provider_fid_idx" UNIQUE, btree (provider, md5(foreign_identifier::text))
    "image_url_key" UNIQUE, btree (url)

Time: 0.024s

https://github.com/WordPress/openverse/pull/2096 adds the new column and sample data but it looks like the upstream db never gets the new column locally for some reason.

Reproduction

Add exit after the two calls to load_sample_data "<media type>" in load_sample_data.sh to avoid needing to scroll.

  1. `just down -v`
  2. `just api/up`
  3. `just api/init`
  4. See the error:

Screenshots

Environment

Additional context

Critical because this mean tests are running with zero data and are therefore effectively useless.

sarayourfriend commented 1 year ago

False alarm, I needed to just rebuild upstream_db for it to get the new SQL schema files from #2096. No idea why though, as I fully rebuilt my local environment after that PR was merged. After rebuild I also had to just down -v and reup to get the new schema.

https://github.com/WordPress/openverse/issues/2364 replaces this issue to fix the underlying devex and beyond issue.