datopian / ckanext-opendatani

GNU Affero General Public License v3.0
4 stars 5 forks source link

[TEST] DCTA- json-rpc harvester #28

Closed steveoni closed 3 months ago

steveoni commented 9 months ago

The client wants to add a new dataset catalog to the portal and the current harvester does not support this new data-catalog, hence a need to add a new harvester implementation

Assumption

We are working on the assumption that we will harvest all the public datasets on the NISRA platform

Acceptance

Task

Analysis

Analysis done by ismail: https://hackmd.io/@susanbot/rJPIlAx7T

Esitmate

From the Ismail analysis he said 3-4.5 days, I will estimate 1 week with a little buffer, might be less.

Setting up the environment locally and ensuring harvester works might take time and also mapping the user data with ckan can take time.

Expected completion date : latest 1st April 2024.

steveoni commented 9 months ago

The new dcat is tested with this rest api https://ws-data.nisra.gov.uk/public/api.restful/PxStat.Data.Cube_API.ReadDataset/CCMSOA/JSON-stat/2.0/en

and here is the harvester: https://ni.dev.datopian.com/harvest/edit/ws-data-rest

and dataset generated: https://ni.dev.datopian.com/dataset/ccmsoa

SusanBot85 commented 5 months ago

Hi @steveoni We are missing the following metadata in the harvested resource: There is a last updated on the source that is not showing in the harvested dataset

image image

We also have the info in terms of Topic category: Economic and Labour Market Statistics (in ODNI: https://admin.opendatani.gov.uk/group/economy) Frequency of update: Monthly contact: name and email and phone is in the source image

Is this metadata not captured in the API?

How can we populate this as well.

ALSO note that for this source, the harvester would need to run monthly

SusanBot85 commented 5 months ago

In terms of the csv resource it looks good

steveoni commented 5 months ago

Hi @steveoni We are missing the following metadata in the harvested resource: There is a last updated on the source that is not showing in the harvested dataset image image

We also have the info in terms of Topic category: Economic and Labour Market Statistics (in ODNI: https://admin.opendatani.gov.uk/group/economy) Frequency of update: Monthly contact: name and email and phone is in the source image

Is this metadata not captured in the API?

How can we populate this as well.

ALSO note that for this source, the harvester would need to run monthly

@SusanBot85 the screenshot of the last updated is for the general dataset not for a specific resource. So if dataset is harvested, it shows the created date as the day harvested and updated datae will show when next it is updated. thats my guess.

also if you want the last updated to be attached to each resources, thats possible. but this can me noted for the next round of work

SusanBot85 commented 5 months ago

but this can me noted for the next round of work

What does this mean? How will it change the scope?

Also what is your feedback in terms of pulling in the additional info that is present on the source: Topic category: Economic and Labour Market Statistics (in ODNI: https://admin.opendatani.gov.uk/group/economy) Frequency of update: Monthly contact: name and email and phone is in the source

cc @gabrielailieva @anuveyatsu

steveoni commented 5 months ago

@SusanBot85

Topic category: Economic and Labour Market Statistics

this contact name not Topic category

steveoni commented 5 months ago

for Frequency and contact name

this already applied with a fixed value (OSNI Mapping Helpdesk) this follow the same pattern they use for their other DCAT harvester we have.

https://github.com/datopian/ckanext-opendatani/pull/29/files#:~:text=else%20%27Updated%27-,package_dict,-%5B%27frequency%27

but for a reason is not getting reflected, so thats a bug to look into.

So for fast feeback i will update this manually so we can send message to the client. and if they approve of it, fixing the bug and other udpate will be part of the main work

steveoni commented 5 months ago

Also question for the client

1) what field in the API should we use for Topic category 2) what field should we use for keyword 3) should we use the contact name and address provided in the API

@SusanBot85