OpenAgFunding / development

For developing responses to current gaps in the availability and usability of open data on funding for agriculture and food security.
0 stars 2 forks source link

Improving organisation identification #11

Open timgdavies opened 8 years ago

timgdavies commented 8 years ago

We need to be able to consistently identify organisations so that It is possible:

Often organisation names are just available as free-text. This creates a challenge when:

This requires tooling to support a combination of:

This might be able to draw upon and improve the existing Organisation Identifier Registration Agency codelist to help select preferred identifiers.

timgdavies commented 8 years ago

@danmihaila we're just starting to explore this issue for OpenAgFunding project - and wondering if any of the work you have done in the past on Org IDs is still live anywhere / something you are up for developing further?

stevieflow commented 8 years ago

There's a related issue around this in terms of how the Organisation Registry codelist is updated within the IATI Standard - http://discuss.iatistandard.org/t/adding-to-the-organisation-registration-agency-code-list-push-rather-than-pull/414

danmihaila commented 8 years ago

@timgdavies we have the work on a server and I could make it available. lets have a chat before, to see what we need to do.

timgdavies commented 8 years ago

Dan has shared a great presentation of work mapped out last year which steps through many of the issues and describes a prototype which @danmihaila has kindly put live at http://happy-devs.atman.ro/solr/org_collection/browse to show how data aggregated from existing IATI data could be used by:

dwalker101 commented 8 years ago

Also was thinking that perhaps we could implement a small string matching algorithm that measures differences in title strings -- e.g. "World Bank" vs. "The World Bank." We could flag strings that are very similar and maybe even automatically equate strings that are above a certain threshold of similarity.

danmihaila commented 8 years ago

@dwalker101 you are right. We tried to do something similar. If you go to http://happy-devs.atman.ro/solr/org_collection/browse and you click on "More organizations with same code" or "More organizations with same name" you will arrive here: http://happy-devs.atman.ro/solr/org_collection/browse?q=&fq=code:%22GB-1%22&sort=count%20desc&show_chart=code or http://happy-devs.atman.ro/solr/org_collection/browse?q=&fq=name:%22Department%20for%20International%20Development%22&sort=count%20desc&show_chart=name