Closed pietercolpaert closed 7 months ago
I strongly disagree here for several reasons.
First, DCAT-3 specification says that:
dcat:Distribution
as one of seven main classes.Second, handling updates of distributions with blank nodes is tricky for implementors, I see three main approaches:
Clearly, option 2 is least desirable as it leads to complicated harvesting and also complicated requests in the frontend. Option 1 sounds good on paper, but the risk is that the problem spreads. (I think have happened on data.europe.eu as even the data services are stored in the same graph as they are reachable from a distribution. This leads to a lot of duplication of triples and consequently much harder to provide a view of dataservice that have a more independent character. I am unsure if it has also spread to contactpoints and publishers.)
Hence, the solution of option 1 has the risk of spreading it's bad influence, causing other problems. What if a data service is represented as a blank node and referenced from many datatasets?
I think option 3 is the best option as it treats the blank nodes as wrong and mints new URIs based on a certain mechanism in the harvesting step that keeps the minted URIs at least semi stable. The intent of option 3 is push back on data publishers that use blank nodes and hope that with time we can be more strict in what we accept in the harvesting step.
Basically what I am saying is: When we have the chance of defining a new protocoll, let's design it in a way that forces people to solve problems earlier in the chain rather than etching the problems into the protocoll itself.
I would also still see it as strongly discouraging it, but still it’s possible, hence we need to make sure it works. Having it as a fallback might be useful.
But where do we draw the line, which standalone entities should be allowed to be provided as blank nodes? Why only Distributions?
I would suggest a motivating underlying rule that says that standalone entities that might be reused (pointed to by more than one triple) should always be required to appear with URIs in separate named graphs.
From this rule I think distributions are the only standalone entities that would be allowed (although discouraged) as blank nodes.
I like this wording and agree!
I’ll close this discussion as final now: the spec now points out in a note that using a dcat:Distribution like this won’t break anything, but we don’t see it as a good practise.
After discussion with the people behind Piveau, it appears distributions are oftentimes blank nodes. In that case, the distribution cannot be a standalone entity, because for standalone entities a named node is required.
I propose to add in the spec that distributions can also be blank nodes on the condition they become embedded entities in the dcat:Dataset.