Closed dev-zero closed 10 years ago
I'd suggest the N-Triples format: http://en.wikipedia.org/wiki/N-Triples
Ultra-simple. Just on a line subject predicate object. Iron melting-point 200. Copper phase solid.
rdflib can automatically read it.
Can you please give a concrete example? The cited format looks rather complicated and the examples above don't make it clear whether there is a separator between subject/predicate/value (resp. node/edge/node)
For the initial import from Wikipedia: use JSON since there are efficient parser around for Python and C/C++.
did we decide now on json?
Yes, JSON for the N-triple format seems ok.
it could anyway be useful have some format that can be read by other libraries (like the rdflib stuff)
so, something like this?
[
{ "subject": "Iron", "property": "Melting point", "value": "1811" },
{ "subject": "Copper", "property": "Melting point", "value": "1357.77" }
]
or
[
[ "Iron", "Melting point", "1811" ],
[ "Copper", "Melting point", "1357.77" ]
]
... unless we are going to decide to use a a full-blown rdf-storage/data-dump after all, of course ;-)
I personally like the second one. I'm trying some experiments with rdflib, the format is really ugly!
the second is nice and easy. should be enough, i think. and easy to parse.
The above is the edge-file, I would propose the following for the properties:
{
"Iron": {
"Melting point": 1234,
"Boiling point": 2000
},
"Copper": {
"Melting point": 1234,
"Boiling point": 2000
}
}
Closing as fixed.
We need to store the data fetched from Wikipedia somewhere. This does not have to be final store, just such that we have a dataset we can work on.