EKGF / dprod

Artifacts of the EKGF Data Product Workgroup (DPROD)
https://ekgf.github.io/dprod
MIT License
22 stars 6 forks source link

Comments about DPROD 11/24/2023 #24

Open matthiasautrata opened 11 months ago

matthiasautrata commented 11 months ago

Please don't take offense. My Black Friday search wasn't successful yet and so I need to vent a little bit.

Writing and commenting as I read. Especially, this may mean that I comment on something that you define much further down without a forward reference. That is somewhat intentional.

DCAT is a catalog. DPROD is for products listed in the catalog. Why would one be a profile/extension of the other?

You use the term “Semantic Data…” What is the semantic of “semantic” in this context?

As pointed out elsewhere: Why emphasize or limit this for “Mesh.” It is a cool word and very popular. But what exactly makes products only suitable in a Mesh? Imagine a data-product that lists the geo-coordinates of fire hydrants around the country. Couldn’t I just download the file and move on? What would be meshy or semantic about that?

I disagree with your notion of shift left/right. The data-owner (maker) should also own the meta-data. Granted, all publishers cannot and should not handle data-integration. But they should handle explaining what it is they are offering. Likewise, I really don’t want a central team to manage my ontology. They are just going to make a mess of it. It is my ontology.

“...and ontological classes…” what does this mean?

“... input and output ports…” ports is undefined

If memory serves, in DCAT a data service is a means to access the dataset described in the catalog. Are you suggesting that data services are extended to describe meta-data?

“...semantic meaning…” This basically says: “meaning meaning.” You sure that is what you mean? ;-)

“...This allows for…” Of course it allows. Its absence also would not prohibit it. It just might make it less convenient.

“...semantics ensure that all stakeholders have a common understanding…” You ought to be really careful with such statements. I have seen ontologies and that didn’t mean that they were useful or supported understanding, especially not by humans. Imagine I wrote you an ontology and obfuscated all names and IRIs, replaced them with UUIDs and eliminated all regular comments and explanations.The formal structure and hence implied formal semantics would remain the same. Good luck emailing one this ontology, say as an RDF/Turtle file and expecting anybody to find that it creates a “common understanding.”

“...fundamental idea behind a Data Mesh…” Colloquialism. Intuitive but neither semantic nor appropriate. If you want to explain what a data mesh is supposed to be, put a reference there assuming that something even half normative exists.

“...can be programmatically understood…” will there be LLMs involved? Or did you mean “...can be verified against something and interpreted programmatically…”?

“...ensure that these products can interact…” are products “active” in the sense that they do something? Like “interact?”

“DPROD maps to the Data Mesh notion of a port to the DCAT notion of a DataService, so we can declare a DataProduct by and specify and input and output ports and these ports are Dataservice.” Could not parse.

DCAT states that a data service is defined as: “A collection of operations that provides access to one or more datasets or data processing functions.” You might make it very clear that you are changing that definition. Is it really just that: One inputPort and one OutputPort? Nothing else?

“A data product is a rational, managed, and governed collection of data, with purpose, value and ownership, meeting consumer needs over a planned life-cycle.”

How do you define and measure “rational?” How do you assign and measure “value?” How is value expressed? In US$? I suppose I could write more here. I’ll leave it as: You don’t define what a data product is. You list a bunch of cool, marketable attributes.

I’ll stop here because the rest seems unfinished.Maybe a parting thought: Reading all of this, I still don’t get at all how I would understand (you know, the semantic bit) what a specific dataproduct (like listOfFireHydrants) would be described and how I would use that. I get that somehow a port is involved. Now what?

Footnote: You mention odrl but don’t seem to use/reference it elsewhere.

tonyseale commented 11 months ago

Thanks for this, Matthias.

What excellent feedback - harsh but fair! This is only a first draft, and I guess we still have a lot of work to do in tightening up the text. Your comments will really help.

Perhaps you can review it again when we have the next version. Also, if you would be willing to submit any revised text yourself, that would be greatly appreciated.

Cheers,

Tony

On 24 Nov 2023, at 20:01, Matthias Autrata @.***> wrote:



Please don't take offense. My Black Friday search wasn't successful yet and so I need to vent a little bit.

Writing and commenting as I read. Especially, this may mean that I comment on something that you define much further down without a forward reference. That is somewhat intentional.

DCAT is a catalog. DPROP is for products listed in the catalog. Why would one be a profile/extension of the other?

You use the term “Semantic Data…” What is the semantic of “semantic” in this context?

As pointed out elsewhere: Why emphasize or limit this for “Mesh.” It is a cool word and very popular. But what exactly makes products only suitable in a Mesh? Imagine a data-product that lists the geo-coordinates of fire hydrants around the country. Couldn’t I just download the file and move on? What would be meshy or semantic about that?

I disagree with your notion of shift left/right. The data-owner (maker) should also own the meta-data. Granted, all publishers cannot and should not handle data-integration. But they should handle explaining what it is they are offering. Likewise, I really don’t want a central team to manage my ontology. They are just going to make a mess of it. It is my ontology.

“...and ontological classes…” what does this mean?

“... input and output ports…” ports is undefined

If memory serves, in DCAT a data service is a means to access the dataset described in the catalog. Are you suggesting that data services are extended to describe meta-data?

“...semantic meaning…” This basically says: “meaning meaning.” You sure that is what you mean? ;-)

“...This allows for…” Of course it allows. Its absence also would not prohibit it. It just might make it less convenient.

“...semantics ensure that all stakeholders have a common understanding…” You ought to be really careful with such statements. I have seen ontologies and that didn’t mean that they were useful or supported understanding, especially not by humans. Imagine I wrote you an ontology and obfuscated all names and IRIs, replaced them with UUIDs and eliminated all regular comments and explanations.The formal structure and hence implied formal semantics would remain the same. Good luck emailing one this ontology, say as an RDF/Turtle file and expecting anybody to find that it creates a “common understanding.”

“...fundamental idea behind a Data Mesh…” Colloquialism. Intuitive but neither semantic nor appropriate. If you want to explain what a data mesh is supposed to be, put a reference there assuming that something even half normative exists.

“...can be programmatically understood…” will there be LLMs involved? Or did you mean “...can be verified against something and interpreted programmatically…”?

“...ensure that these products can interact…” are products “active” in the sense that they do something? Like “interact?” “DPROD maps to the Data Mesh notion of a port to the DCAT notion of a DataService, so we can declare a DataProduct by and specify and input and output ports and these ports are Dataservice.” Could not parse.

DCAT states that a data service is defined as: “A collection of operations that provides access to one or more datasets or data processing functions.” You might make it very clear that you are changing that definition. Is it really just that: One inputPort and one OutputPort? Nothing else?

“A data product is a rational, managed, and governed collection of data, with purpose, value and ownership, meeting consumer needs over a planned life-cycle.”

How do you define and measure “rational?” How do you assign and measure “value?” How is value expressed? In US$? I suppose I could write more here. I’ll leave it as: You don’t define what a data product is. You list a bunch of cool, marketable attributes.

I’ll stop here because the rest seems unfinished.Maybe a parting thought: Reading all of this, I still don’t get at all how I would understand (you know, the semantic bit) what a specific dataproduct (like listOfFireHydrants) would be described and how I would use that. I get that somehow a port is involved. Now what?

Footnote: You mention odrl but don’t seem to use/reference it elsewhere.

— Reply to this email directly, view it on GitHubhttps://github.com/EKGF/data-product/issues/24, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AETKTHEZQ5PXR5OUU3Z46XLYGD4ITAVCNFSM6AAAAAA7ZQUZNGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGAYTAMJXGYZDQNI. You are receiving this because you are subscribed to this thread.Message ID: @.***>

matthiasautrata commented 11 months ago

I admit and apologize: it was a bit more harsh maybe than it needed to be. I'll be happy to do penance and make some suggestions 😊I think my feedback could into two broad categories: editorial/structural and content. Would you like both?Kind regardsMatthiasPS: sometimes not getting one's Black Friday wish is actually a blessing in disguise: I think I found an alternative and avoided about $900 worth of frivolous spending 😅—-Matthias Autrata201-305-0042On Nov 25, 2023, at 04:33, tonyseale @.***> wrote: Thanks for this, Matthias.

What excellent feedback - harsh but fair! This is only a first draft, and I guess we still have a lot of work to do in tightening up the text. Your comments will really help.

Perhaps you can review it again when we have the next version. Also, if you would be willing to submit any revised text yourself, that would be greatly appreciated.

Cheers,

Tony

On 24 Nov 2023, at 20:01, Matthias Autrata @.***> wrote:



Please don't take offense. My Black Friday search wasn't successful yet and so I need to vent a little bit.

Writing and commenting as I read. Especially, this may mean that I comment on something that you define much further down without a forward reference. That is somewhat intentional.

DCAT is a catalog. DPROP is for products listed in the catalog. Why would one be a profile/extension of the other?

You use the term “Semantic Data…” What is the semantic of “semantic” in this context?

As pointed out elsewhere: Why emphasize or limit this for “Mesh.” It is a cool word and very popular. But what exactly makes products only suitable in a Mesh? Imagine a data-product that lists the geo-coordinates of fire hydrants around the country. Couldn’t I just download the file and move on? What would be meshy or semantic about that?

I disagree with your notion of shift left/right. The data-owner (maker) should also own the meta-data. Granted, all publishers cannot and should not handle data-integration. But they should handle explaining what it is they are offering. Likewise, I really don’t want a central team to manage my ontology. They are just going to make a mess of it. It is my ontology.

“...and ontological classes…” what does this mean?

“... input and output ports…” ports is undefined

If memory serves, in DCAT a data service is a means to access the dataset described in the catalog. Are you suggesting that data services are extended to describe meta-data?

“...semantic meaning…” This basically says: “meaning meaning.” You sure that is what you mean? ;-)

“...This allows for…” Of course it allows. Its absence also would not prohibit it. It just might make it less convenient.

“...semantics ensure that all stakeholders have a common understanding…” You ought to be really careful with such statements. I have seen ontologies and that didn’t mean that they were useful or supported understanding, especially not by humans. Imagine I wrote you an ontology and obfuscated all names and IRIs, replaced them with UUIDs and eliminated all regular comments and explanations.The formal structure and hence implied formal semantics would remain the same. Good luck emailing one this ontology, say as an RDF/Turtle file and expecting anybody to find that it creates a “common understanding.”

“...fundamental idea behind a Data Mesh…” Colloquialism. Intuitive but neither semantic nor appropriate. If you want to explain what a data mesh is supposed to be, put a reference there assuming that something even half normative exists.

“...can be programmatically understood…” will there be LLMs involved? Or did you mean “...can be verified against something and interpreted programmatically…”?

“...ensure that these products can interact…” are products “active” in the sense that they do something? Like “interact?”

“DPROD maps to the Data Mesh notion of a port to the DCAT notion of a DataService, so we can declare a DataProduct by and specify and input and output ports and these ports are Dataservice.” Could not parse.

DCAT states that a data service is defined as: “A collection of operations that provides access to one or more datasets or data processing functions.” You might make it very clear that you are changing that definition. Is it really just that: One inputPort and one OutputPort? Nothing else?

“A data product is a rational, managed, and governed collection of data, with purpose, value and ownership, meeting consumer needs over a planned life-cycle.”

How do you define and measure “rational?” How do you assign and measure “value?” How is value expressed? In US$? I suppose I could write more here. I’ll leave it as: You don’t define what a data product is. You list a bunch of cool, marketable attributes.

I’ll stop here because the rest seems unfinished.Maybe a parting thought: Reading all of this, I still don’t get at all how I would understand (you know, the semantic bit) what a specific dataproduct (like listOfFireHydrants) would be described and how I would use that. I get that somehow a port is involved. Now what?

Footnote: You mention odrl but don’t seem to use/reference it elsewhere.

Reply to this email directly, view it on GitHubhttps://github.com/EKGF/data-product/issues/24, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AETKTHEZQ5PXR5OUU3Z46XLYGD4ITAVCNFSM6AAAAAA7ZQUZNGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGAYTAMJXGYZDQNI.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

jgeluk commented 10 months ago

It would be best to convert these long lists of suggestions into individual GitHub issues that are actionable issue by issue. And then refer to the issue numbers in each individual commit message so that it's clear which changes have been made for any given issue. @tonyseale @rivettp @nvar

jgeluk commented 2 months ago

@matthiasautrata @tonyseale @nvar @FroehlichMarcel are all points addressed?