creativeishu / AIM_IBM_KGO

1 stars 0 forks source link

Fetch data from Wikipedia, Periodic Table or Freebase #2

Open dev-zero opened 9 years ago

dev-zero commented 9 years ago

According to @Delaya and @dev-zero, http://dbpedia.org seems to be quiet incomplete, at least the entries from the Infobox for example for Iron seem to be completely missing.

The best course of action is therefore probably to use the Wikipedia API Library to parse the data directly from the respective Wikipedia pages.

Other options are:

This needs #1 to be implemented.

dev-zero commented 9 years ago

While doing that, we have to define a limited set of properties.

dolfim commented 9 years ago

For a list of predicates look at the wiki page

dolfim commented 9 years ago

Note that it would be nice to already perform binning on the numeric properties.

iliazintchenko commented 9 years ago

we should keep the number of bins as a parameter as it defines the depth of search

dolfim commented 9 years ago

Should we then have real raw data and binned data? Or implement binning only when importing in our format?

dev-zero commented 9 years ago

At the moment I'd try to build a chain since it gives us more control over the process and makes the components simpler:

wikipedia -> raw data -> pre-processed data (binning) -> custom format/datastructure
iliazintchenko commented 9 years ago

lets start with a fixed binning level and add functionality to change the binning later