creativeishu / AIM_IBM_KGO

1 stars 0 forks source link

Implement/decide on storage #1

Closed dev-zero closed 10 years ago

dev-zero commented 10 years ago

We need to store the data fetched from Wikipedia somewhere. This does not have to be final store, just such that we have a dataset we can work on.

ralphk86 commented 10 years ago

I'd suggest the N-Triples format: http://en.wikipedia.org/wiki/N-Triples

Ultra-simple. Just on a line subject predicate object. Iron melting-point 200. Copper phase solid.

rdflib can automatically read it.

dev-zero commented 10 years ago

Can you please give a concrete example? The cited format looks rather complicated and the examples above don't make it clear whether there is a separator between subject/predicate/value (resp. node/edge/node)

dev-zero commented 10 years ago

For the initial import from Wikipedia: use JSON since there are efficient parser around for Python and C/C++.

ralphk86 commented 10 years ago

did we decide now on json?

dolfim commented 10 years ago

Yes, JSON for the N-triple format seems ok.

dolfim commented 10 years ago

it could anyway be useful have some format that can be read by other libraries (like the rdflib stuff)

dev-zero commented 10 years ago

so, something like this?

[
    { "subject": "Iron", "property": "Melting point", "value": "1811" },
    { "subject": "Copper", "property": "Melting point", "value": "1357.77" }
]

or

[
    [ "Iron", "Melting point", "1811" ],
    [ "Copper", "Melting point", "1357.77" ]
]

... unless we are going to decide to use a a full-blown rdf-storage/data-dump after all, of course ;-)

dolfim commented 10 years ago

I personally like the second one. I'm trying some experiments with rdflib, the format is really ugly!

ralphk86 commented 10 years ago

the second is nice and easy. should be enough, i think. and easy to parse.

dev-zero commented 10 years ago

The above is the edge-file, I would propose the following for the properties:

{
  "Iron": {
    "Melting point": 1234,
    "Boiling point": 2000
  },
  "Copper": {
    "Melting point": 1234,
    "Boiling point": 2000
  }
}
dev-zero commented 10 years ago

Closing as fixed.