code4lib / shortimer

a webapp for code4lib jobs
http://jobs.code4lib.org
40 stars 15 forks source link

Freebase API to be retired #38

Open thatandromeda opened 9 years ago

thatandromeda commented 9 years ago

Google's retiring the Freebase API on 30 June 2015. Parts of this code depend on Freebase. What's the fallback?

edsu commented 9 years ago

Thanks @thatandromeda it really looks like this really is happening June 30th. Wikidata have it on their roadmap to provide a Wikidata Suggest type of service. But who knows if it will be ready in time. Some work that needs to be done:

edsu commented 9 years ago

This API call is being used by Wikidata's search, and seems to have the basics of what we would need in the UI to select employers and tags.

https://www.wikidata.org/w/api.php?action=wbsearchentities&search=encyc&format=json&language=en&type=item&continue=0

There is a JSON-P callback to allow it to be used, to maybe help get around cross-origin requests (JavaScript from jobs.code4lib.org that wants to talk to wikidata.org).

https://www.wikidata.org/w/api.php?action=wbsearchentities&search=encyc&format=json&language=en&type=item&continue=0&callback=foo

edsu commented 9 years ago

One possible way to map our Freebase ids to WikiData ids. https://gist.github.com/edsu/c95c9ae9f60ecdf80077

tfmorris commented 9 years ago

Google has said that the shutdown will be delayed. I'm pretty sure it was mentioned on the Freebase mailing list, but I can't find the thread right now. If you look at the Wikidata Freebase project page, you'll see the same info:

  • In Q2 2015, a new KG-based Google API will be launched
  • Earliest three months later, the Freebase website will close (planned for Q3 2015)

Because we're already inside the three month window for June 30, the API retirement won't be happening then.

I'd suggest deferring planning of your migration strategy until things are a little clearer, but here are a few random thoughts:

The whole thing is kind of a mess, but it seems unlikely that Freebase will get shut down without a fair amount of notice, so I'd hold off committing to a transition plan until both Wikidata and Google firm up their plans.

If/when you need to map Freebase IDs to Wikidata IDs, this bulk dump might be easier to use than an API.

edsu commented 9 years ago

Thanks for those details @tfmorris ; I didn't know that the announcement on the Freebase website was out of date. Still, I think it should be doable to use the wbsearchentities API call to do the suggest portion, and to use WDQ as a temporary way to turn a few thousand Freebase IDs into WikiData IDs. I'd like to rip this bandaid off now rather than wait, but we'll see since I'm the only person actually maintaining shortimer at this point, and I have other things contending for my attention.

tfmorris commented 9 years ago

@edsu - I think it's early days still for Wikidata and I have concerns about performance and stability of the API, but it's your call. I'd be happy to generate the ID mapping table for you, if that helps.

At a DPLA Hackathon a few years ago, we hacked up Freebase Suggest to work with the DPLA API. You might consider doing something similar for Wikidata. Suggest is actually one of the nicer autocomplete widgets out there (in my opinion).

https://github.com/scande3/dpla-discovery http://static.digitalcommonwealth.org/dpla-discovery/

I don't know if you constrain your Suggest searches by type, etc, but if you're using the Freebase schema at all (types or properties), mapping to the Wikidata schema is another task that needs to be added to the list.

edsu commented 9 years ago

The API may change, but it's hard to imagine it going away entirely after all the integration work that has gone on at Wikimedia. I'm ok with things changing -- in fact that's the best situation, because it means the service isn't dying, and people are working on it. Alas, the writing is definitely on the wall for Freebase.

The suggestions are constrained by type in a few places in shortimer: by employer and location. I see that wbsearchentities has a type parameter that could be used similarly, maybe. If a mapping of types/properties is put together that would be very useful. I think I will be OK with mapping the IDs, but I will be in touch if it gets tricky.

edsu commented 8 years ago

It looks like there may be a path forward using the Google Knowledge Graph, which now has an API and they are planning on adding a suggest widget, similar to the one Freebase offers, and which is so important to the workflow here in shortimer.

Apparently even the freebase identifiers are being used, so there may not be a whole lot of cleanup work that needs to happen in the shortimer database. I think I would prefer to use Wikidata on principle, but it may be easier to transition to the Knowledge Graph.

tfmorris commented 8 years ago

I think using the KG Suggest is the right call. The KG Search API is much less powerful than old Freebase Search API, but it should be fine for this application. The Wikidata Refine Reconciliation Service uses the websearchentities followed by WDQ/SPARQL approach internally and it doesn't appear to me that the search is very robust.

One of the things that I've got on my (long) list of spare time projects is to improve the coverage of matching for Freebase<->Wikidata mappings, which will help provide an escape path if it's needed in the future (plus having the Wikidata reconciliation service for OpenRefine should help with these types of mapping tasks).

BTW, the beta SPARQL endpoint is much faster than the experimental WDQ API, and the data is more current, if you ever have a need to query Wikidata.

tfmorris commented 8 years ago

p.s. My interpretation is that the 3 month clock doesn't start until the KG Suggest API is available too, so there's still some time...

edsu commented 8 years ago

@tfmorris thanks for your comments. If you notice KG Suggest get announced and remember this issue it would be really helpful if you can add a note here. I feel like I only accidentally noticed the KG API announcement!

edsu commented 8 years ago

In preparation for the shortimer db should be updated to store the Freebase Machine ID or mid instead of the id that comes back from the suggest API. This will involve looking them up again.

tfmorris commented 8 years ago

Freebase switched to MIDs for most purposes a while ago, so you may find that the IDs coming back from the Suggest API were MIDs already.

If you have historical /en/... IDs, you can look up the MID with this query:

https://www.googleapis.com/freebase/v1/mqlread/?lang=%2Flang%2Fen&query=%5B%7B+%22id%22%3A+%22%2Fen%2Fharvard_university%22%2C+%22mid%22%3A+null+%7D%5D

Replace the (encoded) /en/harvard_university with the link that you want to look up. If you've got a list of IDs, I'd be happy to look them up for you and generate a crosswalk.

BTW, haven't heard anything additional on shutdown timeframes...

edsu commented 8 years ago

@tfmorris thanks for the update! I did get the database converted over to the mids. I looked them up by resolving URLs like:

https://www.googleapis.com/freebase/v1/topic/{freebase_id}

which seemed to work pretty well still...

edsu commented 8 years ago

It looks like the new Knowledge Graph Search Widget is available. Also some of the old Freebase API calls are starting to fail now, for example getting the location for an organization.

edsu commented 8 years ago

Well, now the old Freebase APIs for looking up Employers and Locations are dead. So people can't enter in new jobs. I guess it would be good to move over to the Knowledge Graph API now ;-)

edsu commented 8 years ago

@tfmorris @danbri do you happen know (or know someone who might know) why topical things like "Semantic Web" don't show up in the Knowledge Graph Search Widget? I get lots of books but not the topic. I even tried with a Search API call to see if I could find the topic in there, but I couldn't find it in 200 results.

Using the JSON-LD context I can see that Google have URIs for entities which is cool. So I can easily turn the old Freebase IDs into Knowledge Graph URIs. For example here's the URI for Semantic Web:

https://g.co/kg/m/076k0

So I can see the entity "Semantic Web" is in the Knowledge Graph, but how can I get the search widget to return it? Would one of the available entity types work?

edsu commented 8 years ago

Maybe this is the push I need to move over to using Wikidata....

danbri commented 8 years ago

I don't know but I'll see what I can find out

danbri commented 8 years ago

(and +1 for Wikidata, regardless)

danbri commented 8 years ago

From a quick guess, is it only returning entities whose types are in https://developers.google.com/knowledge-graph/ (and mapped there to schema.org)?

edsu commented 8 years ago

Hmm, that does seem to be the case? Here are the types returned in the first 200 results when searching for 'semantic web' from the search API:

% curl --silent 'https://kgsearch.googleapis.com/v1/entities:search?query=semantic+web&key=AIzaSyDnh2jo5mhnf1EyIs2VQwc9H_bq1_RAgsE&limit=200&indent=True' | jq -r '.itemListElement[].result["@type"][]' - | sort | uniq -c | sort -rn
 124 Thing
  64 Person
  26 Organization
  21 Corporation
  20 Book
   8 Place
   4 EducationalOrganization
   3 CollegeOrUniversity
   1 Movie
   1 CivicStructure
   1 BookSeries
   1 AdministrativeArea

Unfortunately it seems like a lot of terms used to tag jobs in shortimer are rendered invisible in the KG search api ...

edsu commented 8 years ago

I've been doing some preliminary work trying to migrate things to Wikidata. If you are interested you can track the work over on the wikidata branch.

edsu commented 8 years ago

WIkidata does offer an autosuggest API interface but it doesn't allow you to limit by particular entity types (locations, organizations, etc). This leads to a lot of noise when looking things up. I also tried using the SPARQL endpoint with regex filters, but it seemed very unstable. There were lots of 502 errors. Perhaps that was just something else going on at the time, but it doesn't lend much confidence as a foundation for building on.

Actually, it does look like other people were experiencing problems.

edsu commented 8 years ago

So, even with the Wikidata SPARQL endpoint back to functioning normally it still can take multiple seconds for regex queries (what is needed for autosuggest) to come back. Unfortunately this won't be good enough. The wbsearchentities API call is fast, but it doesn't return back much information, and can't be limited to entities of a particular type (Locations, Organizations, etc).

So, my current thinking is to use the entities that have already been collected in jobs.code4lib.org and run autosuggest against them, and let people enter new entities as needed. This will have the downside that they aren't mapped to Google Knowledge Graph or Wikidata, but I just don't have the cycles to do that at the moment...and the site risks dying completely if it's not possible to post new jobs.

sprater commented 8 years ago

Could the Geonames service be used to look up institutions and locations? It has a rich and snappy API, and support for linked data: http://www.geonames.org/

edsu commented 8 years ago

It could, but that's only part of the puzzle. Unfortunately I don't have the bandwidth to fully address this problem. I'm planning on shutting the site down on November 1st after making static snapshots of the data and website available on Internet Archive.

darvid7 commented 6 years ago

Hi! Sorry to ping this thread. I came across this trying to figure out how to map freebase MIDs to their entities without downloading and searching the 200gb data dump. Does anyone know if the Google Knowledge Graph API contains MIDs in freebase and if it can be queried using freebase MIDs? Thanks!

danbri commented 6 years ago

It should have that, but query.wikidata.org might also have it...

On Wed, 15 Aug 2018, 20:03 David, notifications@github.com wrote:

Hi! Sorry to ping this thread. I came across this trying to figure out how to map freebase MIDs to their entities without downloading and searching the 200gb data dump. Does anyone know if the Google Knowledge Graph API contains MIDs in freebase and if it can be queried using freebase MIDs? Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/code4lib/shortimer/issues/38#issuecomment-413409596, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKZGcd2YdTmQPsXRBHQGAvMZH4yM7Wyks5uROEJgaJpZM4DMc2D .

tfmorris commented 6 years ago

A late reply to the late question (I apparently had this accidentally muted - yay, gmail keyboard shortcuts).

The Freebase MIDs were retained in the Google Knowledge graph and can be used for lookups. The /g IDs (as opposed to the /m IDs which are MIDs) post-date Freebase. As @danbri mentioned, some of them have been mapped to Wikidata entities, but only a small fraction of them. The Google Knowledge Graph will have many more (but the mapping to Wikidata is potentially more useful, if it exists).