Open marfox opened 5 years ago
I may have a suggestion, at least for the IMDb data, in most cases we have the top movies/TV-series by which someone is known . Do we have something similar data coming from Wikidata?
@tupini07 , I think it's a good idea, but the implementation is not trivial. In Wikidata, the person - work relation doesn't seem to be there, while the inverse exists.
For instance, given the director Alex de la Iglesia (Q250627) and the movie El día de la bestia Q1312929, we would only find the director (P57) property in the movie item.
This also holds for the music domain, related to #80
Yes you're right. It might be a bit too complex and specific to only the IMDb dataset.
Other possible ideas for generic features would be to match the gender
and the place of birth
/place of death
fields. gender
is readily available in most data sets: a quick look at the musicbrainz
and imdb
tells that 20% of people in musicbrainz
have a gender, and 100%
of those in imdb
The occurrence of place of death/birth
is much lower (none of the entries in imdb
have a place of birth/death, and 4.5% of those in musicbrainz
have one). However, it might be a powerful feature for those entries that do have it.
Another idea would be to leverage the information we currently have in the IMDb dataset about the main occupations of a person. During the importing process these occupations are already transformed to their respective QIDs (as decided in #165 ), so for each person we basically have a list of QIDs representing their professions.
We could even compare them directly, probably the recordlinkage
provides some functionality to do this.