Open craig552uk opened 11 years ago
I really like the idea of incorporating nested data in a single file. I'm concerned that the mechanism above is a bit hard to use. I wonder about something like this:
# ,$id ,department ,courses,course_name ,location ,requirements,level ,grades
about, , , ,courses ,courses ,courses ,requirements,requirements
,#AMS,American Studies,#T700 ,American Studies BA,On Campus , ,A ,BBB
,#AMS, ,#T700 , ,Distance Learning, ,IB ,Pass diploma with 30 points
,#AMS, ,#T700 , , , ,Access ,Pass diploma with 30 level 3 credits
,#AMS, ,#T700 , , , ,BTEC ,Pass diploma with DDM
,#AMS, ,#T701 ,American Studies MA,On Campus , ,A ,ABB
,#AMS, ,#T701 , ,Distance Learning, ,IB ,Pass diploma with 32 points
,#AMS, ,#T701 , , , ,Access ,Pass diploma with 30 level 3 credits
,#AMS, ,#T701 , , , ,BTEC ,Pass diploma with DDM
This keeps the properties in the header line, and then references the properties from the (proposed) about
line. It means adding a new column for the (identifiers for the) requirements, which is blank.
We'll have to assume that any column that is mentioned in the about
line holds URLs, and that a blank value means a blank node (as with the $id
column).
This is much clearer and less open to mistakes - I like it.
I'd like to suggest in
as an alternate preposition to about
.
So the fields could be read like...
"course_name in courses"
"level in requirements"
Also, for readability, the specification could support an optional $
character pre-pending any field that is to be treated like an ID?
The Linked CSV standard allows for multiple rows to be joined on the $id field. However this can only represent list structures. Extending this and introducing a new prolog line
join
would allow recursive structures of any depth to be represented in tabular format.The examples below illustrate how this might work with University course data represented as a 3-layered hierarchical data structure: Department > Course > Requirement
courses.csv
courses.json
The
$id0
field is used to join table rows as a single record in much the same way as$id
is used in Linked CSV. Subsequent$id*
fields are used to join rows at lower levels of the data structure.$id*
fields must use incremental integers specifying the level of the structure that they apply to. The exception being$id
which is an alias for$id0
.The scope (across fields) of joins at each level are specified by use of the
join
prolog lines.join
statements must be listed in increasing order of specificity of the structure. In the attached example all fields set to join under the#courses
identifier are joined in to an object on the second tier of the structure.A
$id*
field must have an associatedjoin
statement if it is greater than$id0
. No join statement is needed to specify the scope of the top level ($id0
) as it is assumed to be composed of all fields.If a
join
statement is provided across multiple fields (e.g#requirements
) those fields are joined in to an object. In this case if no$id*
field is provided, each row of the table is considered to be a separate object.If a
join
statement is provided across a single field (e.g.#location
) that field is joined in to a list. No$id*
field can be used in this case.Multiple
join
scopes can be specified in a single statement, but all scopes in a statement must be contained within the scope of the precedingjoin
statement (except the first).I don't know if you think this enhancement is necessary in the Linked CSV spec, as deep hierarchic structures can be created by linking together multiple documents. But I think it would be nice to be able to accommodate this in a single file. What do you think?