GlobalNamesArchitecture / gnresolver

MIT License
0 stars 0 forks source link

Remove names from `name_strings` that has no corresponding `name_string_index` #56

Open alexander-myltsev opened 8 years ago

alexander-myltsev commented 7 years ago

@dimus , orphans are listed at http://opendata.globalnames.org/staging/out.txt . File is too big (1.9Gb) to place it here. It has 4180611 records.

It is formed by SQL query:

select * from name_strings
  left join name_string_indices
    on name_strings.id = name_string_indices.name_string_id
where name_string_indices.name_string_id is NULL;

Please, verify everything so I can remove those orphans from database.

alexander-myltsev commented 7 years ago

done by query:

delete from name_strings
where id in (
 select id from name_strings
  left join name_string_indices
   on name_strings.id = name_string_indices.name_string_id
 where name_string_indices.name_string_id is NULL
);

DELETE 4180606

Tables are clean now:

select count(*) from name_string_indices
 left join name_strings
  on name_strings.id = name_string_indices.name_string_id
where name_strings.id is NULL;

 count 
-------
     0
(1 row)

select count(*) from name_strings
 left join name_string_indices
  on name_strings.id = name_string_indices.name_string_id
where name_string_indices.name_string_id is NULL;

 count 
-------
     0
(1 row)