GSA / data.gov

Main repository for the data.gov service
https://data.gov
Other
600 stars 95 forks source link

Increase harvest speed by dropping some DB indexes #4211

Open FuhuXia opened 1 year ago

FuhuXia commented 1 year ago

Preliminary findings show that there are ways to increase harvesting speed by 45% by dropping some CKAN db indexes. We might need to do this in the future if we have more harvest sources added in our system or agencies have more frequent updates.

Some indexes are obvious duplicates, can be dropped witout negative impact. Some indexes are not needed in our system with Solr as search engine.

Indexes to drop:

In table package_tag:

"idx_package_tag_id" btree (id)
"idx_package_tag_pkg_id_tag_id" btree (tag_id, package_id)
"idx_package_tag_tag_id" btree (tag_id)

In table resource:

"idx_package_resource_id"  btree (id),
"idx_package_resource_url" btree (url) 

In table member:

    "idx_package_group_id" btree (id)
    "idx_package_group_pkg_id" btree (table_id)
    "idx_package_group_pkg_id_group_id" btree (group_id, table_id)
Chantellewilliams commented 1 year ago

Increase my instagram followers