Open dev-zero opened 9 years ago
While doing that, we have to define a limited set of properties.
Note that it would be nice to already perform binning on the numeric properties.
we should keep the number of bins as a parameter as it defines the depth of search
Should we then have real raw data and binned data? Or implement binning only when importing in our format?
At the moment I'd try to build a chain since it gives us more control over the process and makes the components simpler:
wikipedia -> raw data -> pre-processed data (binning) -> custom format/datastructure
lets start with a fixed binning level and add functionality to change the binning later
According to @Delaya and @dev-zero, http://dbpedia.org seems to be quiet incomplete, at least the entries from the Infobox for example for Iron seem to be completely missing.
The best course of action is therefore probably to use the Wikipedia API Library to parse the data directly from the respective Wikipedia pages.
Other options are:
This needs #1 to be implemented.