commonsense / conceptnet

ConceptNet: a semantic network of common sense knowledge
http://csc.media.mit.edu/conceptnet
GNU General Public License v2.0
274 stars 50 forks source link

Inconsistent web browse vs python API access #16

Closed seeotter1 closed 3 years ago

seeotter1 commented 3 years ago

On browse page ConceptNet.io, search for "lead singer" gives reasonable return of related terms, as below:

Related terms

en front man (n) ➜ en frontman ➜ en frontwoman ➜ en vocalist ➜ en band ➜

However when accessing through python/API in this way:

obj = requests.get('http://api.conceptnet.io/related/c/en/lead_singer?filter=/c/en').json()

seem that it cannot handle the phrase:

[{'@id': '/c/en/lead_poisoning', 'weight': 0.706}, {'@id': '/c/en/lead_to', 'weight': 0.649}, {'@id': '/c/en/lead_acid', 'weight': 0.634}, {'@id': '/c/en/result_in', 'weight': 0.556}, {'@id': '/c/en/nicad', 'weight': 0.527}, {'@id': '/c/en/nickel_cadmium', 'weight': 0.506}, {'@id': '/c/en/alkaline_battery', 'weight': 0.503}, {'@id': '/c/en/lead', 'weight': 0.496}, {'@id': '/c/en/leads', 'weight': 0.491}, {'@id': '/c/en/electrolytic', 'weight': 0.472}, {'@id': '/c/en/batteries', 'weight': 0.462}, {'@id': '/c/en/plumbum', 'weight': 0.441}, {'@id': '/c/en/battery', 'weight': 0.434}, {'@id': '/c/en/lithium_battery', 'weight': 0.432}, {'@id': '/c/en/leadeth', 'weight': 0.429}, {'@id': '/c/en/li_ion', 'weight': 0.428}, {'@id': '/c/en/come_from', 'weight': 0.427}, {'@id': '/c/en/lithium_ion_battery', 'weight': 0.425}, {'@id': '/c/en/rechargeable_battery', 'weight': 0.413}, {'@id': '/c/en/duracell', 'weight': 0.408}, {'@id': '/c/en/electrolyte', 'weight': 0.407}, {'@id': '/c/en/bring_on', 'weight': 0.402}, {'@id': '/c/en/anode', 'weight': 0.397}, {'@id': '/c/en/terne', 'weight': 0.397}, {'@id': '/c/en/be_made', 'weight': 0.394}, {'@id': '/c/en/litharge', 'weight': 0.381}, {'@id': '/c/en/electrolytically', 'weight': 0.379}, {'@id': '/c/en/come_to', 'weight': 0.379}, {'@id': '/c/en/comes_to', 'weight': 0.378}, {'@id': '/c/en/charge_battery', 'weight': 0.377}, {'@id': '/c/en/cadmium', 'weight': 0.375}, {'@id': '/c/en/get_to', 'weight': 0.364}, {'@id': '/c/en/turn_to', 'weight': 0.362}, {'@id': '/c/en/calin', 'weight': 0.361}, {'@id': '/c/en/expected_to', 'weight': 0.361}, {'@id': '/c/en/led', 'weight': 0.359}, {'@id': '/c/en/end_up', 'weight': 0.347}, {'@id': '/c/en/mercury_poisoning', 'weight': 0.344}, {'@id': '/c/en/come_in', 'weight': 0.343}, {'@id': '/c/en/leaded', 'weight': 0.342}, {'@id': '/c/en/recharger', 'weight': 0.341}, {'@id': '/c/en/electroplating', 'weight': 0.34}, {'@id': '/c/en/minamata_disease', 'weight': 0.339}, {'@id': '/c/en/electronic_devices', 'weight': 0.339}, {'@id': '/c/en/overvoltage', 'weight': 0.338}, {'@id': '/c/en/rechargeable', 'weight': 0.338}, {'@id': '/c/en/deal_with', 'weight': 0.337}, {'@id': '/c/en/coming_in', 'weight': 0.335}, {'@id': '/c/en/charger', 'weight': 0.333}, {'@id': '/c/en/fall_in', 'weight': 0.333}]

However, other phrases seem to work ok:

obj = requests.get('http://api.conceptnet.io/related/c/en/kissing_cousin?filter=/c/en').json()

[{'@id': '/c/en/kissing_cousins', 'weight': 1.0}, {'@id': '/c/en/cousins', 'weight': 0.881}, {'@id': '/c/en/cousin', 'weight': 0.787}, {'@id': '/c/en/relative', 'weight': 0.66}, {'@id': '/c/en/kinswoman', 'weight': 0.626}, {'@id': '/c/en/niece', 'weight': 0.603}, {'@id': '/c/en/relatives', 'weight': 0.602}, {'@id': '/c/en/nephew', 'weight': 0.6}, {'@id': '/c/en/uncles', 'weight': 0.599}, {'@id': '/c/en/nephews', 'weight': 0.597}, {'@id': '/c/en/nieces', 'weight': 0.588}, {'@id': '/c/en/aunts', 'weight': 0.581}, {'@id': '/c/en/neice', 'weight': 0.578}, {'@id': '/c/en/siblings', 'weight': 0.568}, {'@id': '/c/en/younger_sibling', 'weight': 0.566}, {'@id': '/c/en/sister', 'weight': 0.546}, {'@id': '/c/en/brother', 'weight': 0.544}, {'@id': '/c/en/paternal_uncle', 'weight': 0.542}, {'@id': '/c/en/sibling', 'weight': 0.541}, {'@id': '/c/en/kinfolk', 'weight': 0.529}, {'@id': '/c/en/maternal_aunt', 'weight': 0.526}, {'@id': '/c/en/sisters', 'weight': 0.525}, {'@id': '/c/en/maternal_uncle', 'weight': 0.518}, {'@id': '/c/en/granduncle', 'weight': 0.515}, {'@id': '/c/en/stepbrother', 'weight': 0.511}, {'@id': '/c/en/kinsman', 'weight': 0.51}, {'@id': '/c/en/aunt', 'weight': 0.51}, {'@id': '/c/en/kid_sister', 'weight': 0.508}, {'@id': '/c/en/sistren', 'weight': 0.506}, {'@id': '/c/en/stepsister', 'weight': 0.5}, {'@id': '/c/en/uncle', 'weight': 0.499}, {'@id': '/c/en/paternal_aunt', 'weight': 0.498}, {'@id': '/c/en/grandniece', 'weight': 0.495}, {'@id': '/c/en/grandnephew', 'weight': 0.494}, {'@id': '/c/en/distantly_related', 'weight': 0.489}, {'@id': '/c/en/brothers', 'weight': 0.488}, {'@id': '/c/en/brethren', 'weight': 0.484}, {'@id': '/c/en/kinsfolk', 'weight': 0.483}, {'@id': '/c/en/younger_brother', 'weight': 0.477}, {'@id': '/c/en/kin', 'weight': 0.47}, {'@id': '/c/en/sista', 'weight': 0.463}, {'@id': '/c/en/kinfolks', 'weight': 0.455}, {'@id': '/c/en/aunty', 'weight': 0.451}, {'@id': '/c/en/elder_brother', 'weight': 0.447}, {'@id': '/c/en/family_reunion', 'weight': 0.447}, {'@id': '/c/en/grandparents', 'weight': 0.445}, {'@id': '/c/en/aunties', 'weight': 0.444}, {'@id': '/c/en/granddaughters', 'weight': 0.442}, {'@id': '/c/en/congeneric', 'weight': 0.441}, {'@id': '/c/en/cognatic', 'weight': 0.434}]

Does anybody know what's happening here?

rspeer commented 3 years ago

What you're asking the web site for is a different query than what you're asking the API for.

You can get the results the Web interface uses from: http://api.conceptnet.io/c/en/lead_singer

and to get exactly the same results, including the grouping by relation type: http://api.conceptnet.io/c/en/lead_singer?grouped=true

The /related/ query uses machine learning to suggest related concepts, including those that aren't connected by a single edge. It has a more limited vocabulary -- particularly in the smaller version can be queried quickly enough to run a public API -- and when you ask it for a phrase that's outside of its vocabulary, one of the ways it deals with it is to try to reduce it to a prefix that it knows about. Unfortunately, what it found in its vocabulary here was the prefix /c/en/lead, which of course leads to the results about lead-acid batteries and such.

I don't know specifically if "lead singer" is in the vocabulary of ConceptNet Numberbatch, the larger downloadable version of the machine-learned embeddings, but it's worth a try: https://github.com/commonsense/conceptnet-numberbatch

seeotter1 commented 3 years ago

Thank you so much for this super-excellent answer! Much appreciated :)