Closed RickMoynihan closed 11 months ago
This would contradict the spec by default - I'd be inclined to invert the behaviour of that option.
I wonder if it'd be useful to surface the result of the steps taken to locate the metadata (though idk how easily it'd be to work with this via the CLI).
Ok, it looks like @Robsteranium is correct, and the spec results in us using "embeded metadata", which is all optional and undefined. However that section says the following (in the case where no explicit embedded is used):
Parsing based on the default dialect for CSV, as described in 8. Parsing Tabular Data, will extract column titles from the first row of a CSV file.
So this then becomes our fallback "metadata document" which results in the useless RDF.
If we're to be spec conformant we would need to
-t
is supplied and we have fallen back to using embedded metadata.However after some more reflection I think it may be better to deviate from the spec in this regard, and fail fast on the RDFization in this case.
I just don't think the output data is useful at all, or ever what anyone would want or expect. This feels very much like an accidental outcome of the spec.
I think we should just change the behaviour. We could add an option in the future to be spec compliant in this regard; but I honestly think nobody would ever want to enable it :-)
While the 'embedded' output is rarely useful, it's not clear what benefit there would be to deviating from the spec here? If it's to guard against accidentally failing to supply a metadata document, this would be obvious in the output.
We've agreed to close this, because you should normally only be RDFIzing and expecting meaningful output if you have a metadata document, and if you have a metadata document, in an automated context it's always better to start explicitly from there rather than the CSV.
Related to issue #186 - when running csv2rdf with just a
-t
tablecsv2rdf
does not locate the metadata document, and instead performs the default conversion.The default conversion generates a literal RDF representation of the csv, which is of little use to us in most cases. In most cases it would be better to fail with an explicit error; rather than proceding to generate data of little practical value.
I'd suggest we:
stderr
.--proceed-without-metadata
to engage the current behaviour (generating the default RDFization of the literal CSV where there is no metadata document).