GSA / datagov-ckan-multi

Other
10 stars 6 forks source link

Implement collections integration for CKAN search #263

Closed thejuliekramer closed 4 years ago

thejuliekramer commented 4 years ago

User Story

As a data.gov developer I want to isolate the collections feature related to CKAN search so that we can upgrade to CKAN 2.8.

Acceptance Criteria

Create tickets for below:

Task-list local dev

Task-list sandbox

Once we have a functioning Catalog app running on CKAN 2.8 with the following extensions we can do final UAT testing

avdata99 commented 4 years ago

Starting TDD

avdata99 commented 4 years ago

isPartOf is only used for DCAT-US sources. We manage the collection relationship in different ways for each source type.

[x] = upstreamed

DataType Collection notes Harvester Extension
Datajson Used at import stage here DataJsonHarvester inherits from DatasetHarvesterBase datajson
Geo-dataportal A Harvester for CSW servers, with customizations for data.gov) Inherit from CSWHarvester and GeoDataGovHarvester geodatagov
WAF [x] We use collection_package_id at fork and upstream. Waf harvester here. It's also used at upstream WAFHarvester inherits from SpatialHarvester spatial
WAF-collections We use collection_package_id at WAFCollectionHarvester.get_package_dict, here WAFCollectionHarvester inherit from GeoDataGovWAFHarvester which inherit from WAFHarvester and GeoDataGovHarvester spatial
CSW [x] CSWHarvester: At the fork, we have a command in which we add the collecion_package_id extra. This command exists at upstream but didn't add the extra. At load pycsw here. It's an internal command. It doesn't exists upstream (for CSW) GeoDataGovCSWHarvester inherit from CSWHarvester and GeoDataGovHarvester spatial
Z3950 It's covered by parent classes Z3950Harvester inherit from GeoDataGovHarvester -> SpatialHarvester geodatagov
ArcGIS It's covered by parent classes ArcGISHarvester inherit from SpatialHarvester geodatagov
Doc It's covered by parent classes GeoDataGovDocHarvester inherit from DocHarvester and GeoDataGovHarvester geodatagov

Aaron's Harvest Source Report

source_type total_datasets count
waf-collection 781731 379
datajson 72583 150
waf 32333 466
ckan 26021 2
csw 398 7
z3950 177 3
single-doc 3 16
geoportal 0 5
arcgis 0 5
Grand Total 913246 1033
thejuliekramer commented 4 years ago

Created separate issue for dataset count N+1 bug https://github.com/GSA/datagov-ckan-multi/issues/337

kimwdavidson commented 4 years ago

Duplicate. closing