LibrariesHacked / openlibrary-search

Searching Open Library by keywords to return ISBNs
147 stars 26 forks source link

null value in author_key when populating the authorship table #5

Closed kymahi closed 7 months ago

kymahi commented 1 year ago

Using the 2022/11/15 data dumps, and using

insert into authorship
select distinct 
    jsonb_array_elements(data->'authors')->'author'->>'key', 
    key 
from works
where 
    key is not null
    and data->'authors'->0->'author' is not null

I'm getting the following error:

ERROR:  null value in column "author_key" of relation "authorship" violates not-null constraint
DETAIL:  Failing row contains (null, /works/OL20655218W).
SQL state: 23502

PostgreSQL 14.6 (Homebrew)

DaveBathnes commented 1 year ago

Thanks for raising this. I get the same in the latest dumps.

It looks like there must be an author specified without a key so it's trying to enter a null value for the author_key table. You could get round this by changing the insert to just ensure it only selects where the underlying query returns an author key and a works key.

insert into authorship
select distinct author_key, work_key
from (
    select
        jsonb_array_elements(data->'authors')->'author'->>'key' as author_key, 
        key as work_key
    from works
    where key is not null
    and data->'authors'->0->'author' is not null) author_works
where author_key is not null 
and work_key is not null;
DaveBathnes commented 1 year ago

I'm working on a newer set of scripts and automation that will create and populate the database in one go, so will test the change along with those (preliminary testing looks good!)

kymahi commented 1 year ago

Awesome! Thanks a ton!