IATI / ckanext-iati

CKAN extension for the IATI Registry
http://iatiregistry.org
9 stars 6 forks source link

Create list of changed org and publisher ids (re-opened) #324

Open amy-silcock opened 3 years ago

amy-silcock commented 3 years ago

Work was done last year to create a mapping of publisher ids on the IATI Registry. Found here: https://iatiregistry.org/ckan-admin/iati-redirects Issue: https://github.com/IATI/ckanext-iati/issues/218

However, this is only available for Registry Sysadmin accounts. It's not available to various IATI tool providers like the d-portal devs.

We need two public mappings

This is what the current sysadmin mapping for publisher ids looks like: image

See the discussion on user need here: https://github.com/devinit/D-Portal/issues/503

markbrough commented 3 years ago

Just bumping this -- it would be great to make this mapping publicly available, e.g. for tracking the recent change from ec-devco to ec-intpa

andylolz commented 3 years ago

I understand this work is starting up soon, which is great.

@jodiegardiner Can I confirm that the plan is to create two public mappings – one for registry identifiers (e.g. ec-echo) and one for org IDs (e.g. XI-IATI-EC_ECHO). Thanks

jodiegardiner commented 3 years ago

Hi @andylolz,

That's the understanding right now yes. We'll be on this one early next week.

Right now I'm thinking to just have a straightforward mappings.json available on a public url. Is that sufficient for the requirements?

andylolz commented 3 years ago

Right now I'm thinking to just have a straightforward mappings.json available on a public url. Is that sufficient for the requirements?

That sounds fine, yes. A couple of questions:

  1. Will this update automatically for future changes, or is there a manual step involved?
  2. For historic changes, does a log of org ID mappings already exist? I know there are the redirects for the registry IDs, but I don’t know if CKAN logs changes to metadata?
  3. What are you thinking for the data structure? Would something like the following work?
    {
        "registry IDs": [
            {
                "before": "agriprofocus",
                "after": "nfp",
                "date": "2021-09-03"
            },
            {
                "before": "lftwnl",
                "after": "syf",
                "date": "2021-06-09"
            },
            {
                "before": "ec-devco",
                "after": "ec-intpa",
                "date": "2021-04-30"
            }
        ],
        "organisation IDs": [
            {
                "before": "47111",
                "after": "XM-DAC-47111",
                "date": "2021-09-02"
            },
            {
                "before": "BE-BCE_KBO-0453475391",
                "after": "BE-BCE_KBO-0453975341",
                "date": "2021-06-26"
            },
            {
                "before": "XI-IATI-EC_DEVCO",
                "after": "XI-IATI-EC_INTPA",
                "date": "2021-04-30"
            },
            {
                "before": "46004",
                "after": "XM-DAC-46004",
                "date": "2021-04-29"
            },
            {
                "before": "30001",
                "after": "CH-FDJP-CHE-110347351",
                "date": "2021-02-20"
            }
        ]
    }
notshi commented 3 years ago

Right now I'm thinking to just have a straightforward mappings.json available on a public url. Is that sufficient for the requirements?

As mentioned in https://github.com/devinit/D-Portal/issues/503#issuecomment-448370692, it would be great for a publisher to have a list of previously used identifiers included in their registry data. This seems like something important that the registry should be used to keep track of centrally for the good of everyone using IATI data.

ss-bhat commented 3 years ago

@andreaszenasidi @jodiegardiner @andylolz @amy-silcock @notshi

we can include the details in the existing endpoint

Only for publisher member:

/api/action/organization_show?id=<your publisher id>

However, I don't think so it's possible to get the change in organization IDs, because the new version of CKAN activity doesn't store the organization extra changes (it tracks only package extra changes). New ideas are always welcome :)

Please let me know if you want me to go ahead with this.

andreaszenasidi commented 3 years ago

@andreaszenasidi @jodiegardiner @andylolz @amy-silcock @notshi

we can include the details in the existing endpoint

Only for publisher member:

/api/action/organization_show?id=<your publisher id>

However, I don't think so it's possible to get the change in organization IDs, because the new version of CKAN activity doesn't store the organization extra changes (it tracks only package extra changes). New ideas are always welcome :)

Please let me know if you want me to go ahead with this.

Adding the list of old publisher ids to the existing endpoint sounds good to me.

@andylolz @notshi please let Swaroop know if there are any concerns with this approach

andylolz commented 3 years ago

Great to hear this is progressing. If done correctly, this will be extremely useful.

Can I reiterate the questions I asked above:

andreaszenasidi commented 3 years ago

@gtkChop could you please address the above questions? Many thanks!

ss-bhat commented 3 years ago

@andreaszenasidi @andylolz

ss-bhat commented 3 years ago

@andylolz @andreaszenasidi

This is the sample:

Screen Shot 2021-10-09 at 18 07 59

Request URL:

/api/action/organization_show?id=cavwoc&show_historical_publisher_ids=true
andylolz commented 3 years ago

Great - thank you. Sample output and request looks good to me.

What do you think, @notshi?

notshi commented 3 years ago

Thanks, @andylolz for being on the ball on this!

This looks like changes of slug names; ie. CKAN internal ids and not publisher_iati_ids which is what we need. Theoretically, not sure how useful this is as the slug names are not used in IATI data. Am I missing something?

There needs to be some distinction between CKAN and IATI terms.

andreaszenasidi commented 3 years ago

@gtkChop could you confirm please if the _oldname field in the response is indeed the Publisher Id field from the registry account? I believe @notshi was referring to this in the above comment.

If it is, then please go ahead and implement this. Thanks!

andylolz commented 3 years ago

@notshi is correct – publisher_iati_id here refers to the publisher’s organisation identifier, or "org ID". (On the registry frontend, this is referred to as the “Identifier”.) I think it’s also what @gtkChop was referring to as the organization ID here:

However, I don't think so it's possible to get the change in organization IDs, because the new version of CKAN activity doesn't store the organization extra changes (it tracks only package extra changes). New ideas are always welcome :)

In the API response, name, old_name and historical_publisher_id all refer to the registry slug for the publisher (sometimes called the publisher ID).

ss-bhat commented 3 years ago

@andreaszenasidi @andylolz @notshi

Yes, the old_name is confusing, old_name => previous publisher_iati_id's

To be more clear please see below changes: Screen Shot 2021-10-16 at 15 22 12

@andreaszenasidi I am pushing this prod and we can improve iteratively if needed. Thanks

E.g: https://iatiregistry.org/api/action/organization_show?id=pstc_1102&show_historical_publisher_ids=true

andylolz commented 3 years ago

old_name => previous publisher_iati_id's

Apologies, I think that is incorrect. The publisher_iati_id is the org ID. In the example you posted, it’s MW-CNM-C057/1998. That’s an org ID using the MW-CNM prefix.

name and old_name refer to the registry slug (sometimes confusingly called the publisher ID). So I think renaming old_name to old_publisher_iati_id is actually more confusing than before.

ss-bhat commented 3 years ago

@andylolz Yes you are right. Changing back to old_name.

ss-bhat commented 3 years ago

@andreaszenasidi

Please see: https://iatiregistry.org/api/action/organization_show?id=pstc_1102&show_historical_publisher_names=true

To see historical publisher names - user should be a member of the given publisher or sysadmin

andylolz commented 3 years ago

To see historical publisher names - user should be a member of the given publisher or sysadmin

Is the plan to make this publicly available? I.e. is it temporarily private while @andreaszenasidi reviews?

andreaszenasidi commented 3 years ago

@gtkChop I can now see the historical_publisher_names. Looks good to me. The only issue is that this information is only returned when I am logged in as a sysadmin to the Registry. This data should be available to the public, not only to the registry users.

ss-bhat commented 2 years ago

@andreaszenasidi

I have made historical_publisher_names public in staging. However, there are a few things I would like to bring to your notice -

If all good, i will deploy this to production

andreaszenasidi commented 2 years ago

@gtkChop looks good on staging. You can deploy it to production. Thank you!

ss-bhat commented 2 years ago

@andreaszenasidi Done deployed

andreaszenasidi commented 2 years ago

@gtkChop thank you!

@andylolz @notshi this was deployed. If you run into any issues please let us know.

notshi commented 2 years ago

Many thanks for the update and making this live. I can confirm that we have access to that page.

Screenshot 2022-03-16 at 19-20-32 https __iatiregistry org

Just wondering - as per your example url, it looks like there are 4 old_name records for this publisher. However, 2 are repeated so technically there should only be 2 old_name records. Is this expected?

Also, will this url be updated and maintained (forever)?

notshi commented 2 years ago

Hi @andreaszenasidi @gtkChop,

Upon closer inspection, it looks like the old_names are CKAN slug names so this is not useful at all.

We need the publisher_iati_id as was mentioned in this previous comment and the comments thereafter.

As such, please re-open this issue.

andreaszenasidi commented 2 years ago

@notshi thanks for testing this! Seems like the name change activity was recorded at a different time, hence they appear multiple times. If this is not useful, we could return the earliest name change. For ex. for pstc_bgd_24493 we can just return the first occurrence?

Could you please clarify where you see the slug names? These were indeed the old publisher id's for this publisher. If you check another publisher, for example https://iatiregistry.org/api/action/organization_show?id=ec-intpa&show_historical_publisher_names=true you can see that the old_names include ec_devco, ec-devco etc.

notshi commented 2 years ago

@andreaszenasidi

As per the example you've provided, you will get the following.

publisher_iati_id: "XI-IATI-EC_INTPA"
name: "ec-intpa"
old_name: "ec_devco"
old_name: "ec-devco"

name and old_name are both CKAN ids aka the slug names of the publisher. They also form the url of the dataset, ie. https://iatiregistry.org/publisher/ec-intpa

As such, the historical publisher old names here only reflect changes when publishers change their Registry (CKAN id) url and not their IATI identifier.

We are only interested in the publisher_iati_id as these are organisation identifiers used in IATI data outside of CKAN.

The original issue was to list the historical changes of IATI publisher (org) identifiers so that we can redirect old pages to their new ones on d-portal.

So for our example https://github.com/devinit/D-Portal/issues/503, there are no old_names found for this publisher as although they have changed their IATI identifier, they have not changed their CKAN id.

https://iatiregistry.org/api/action/organization_show?id=wvi&show_historical_publisher_names=true

andreaszenasidi commented 2 years ago

@notshi thanks for clarifying!

andylolz commented 2 years ago

This ticket (and its predecessor, #218) is called:

Create list of changed org and publisher ids

(emphasis mine). I.e. two lists for the two types of IDs. Both of these lists would be useful.

The lack of consistent naming for these two different identifiers is a cause of confusion. It would be great to also clear that up.

adrianoamaral commented 2 years ago

@andreaszenasidi Please describe the actions that you expect to be addressed in this issue

andreaszenasidi commented 2 years ago

@adrianoamaral the ticket had 2 requirements.

  1. Publicly available Publisher ID changes.
  2. Publicly available IATI Organisation Identifier changes.

Currently, only the 1st requirement was implemented. Similar to the 1st list that is now available under the organization_show endpoint we need an additional section under this endpoint that will list all the historic IATI Organisation Identifier changes.