Open rat10 opened 3 weeks ago
Our proposal doesn't prescribe how the CDT data is stored internally, whereas I think the Daga work restricts itself to encoding sequence-like data as RDF triples. So the two different works are working at different conceptual levels. Internally, a CDT implementation could use structures similar to those used in Daga, or keep the CDT literals as literals and lazily turn them into structured data as needed, or use more optimized data structures internally. At this point, our work isn't concerned with how it's implemented (though obviously that choice will impact performance on any given workload).
In a paper about this proposal, published at ESWC 2024 [0], work by Daga et al [1] is mentioned which evaluates the performance of five different approaches to representing lists in RDF (RDF containers, RDF collections, a design pattern, explicit numbering, numbered properties). The comparison is performed on different database systems and the results show that performance by and large doesn't depend on the software but on the representation. Have you compared the performance of CDT to those other approaches on Jena (or AWS or any other system)? Could you extend the comparison in [1] with CDT?
[0] O. Hartig et al, Datatypes for Lists and Maps in RDF Literals, ESWC 2024, pdf [1] E. Daga, A. Merono-Penuela, and E. Motta. Sequential Linked Data: The State of Affairs. Semantic Web, 12(6):927–958, 2021. pdf