Closed likelion closed 6 years ago
The rdf_subject/1 is needed to generate subjects. It cannot be delated. We can do
( var(Subject)
-> rdf_subject(Subject)
; true
).
I'm a bit surprised this being so slow. Do you have timings?
Yes, here is a result of profile(rdf_save(stream(Out), [graph(Graph)]))
when serializing a graph with 91 triples. It definitely should not take 1.28s to perform such operation, you can see ~200K triples (from other graphs) are being checked.
One thing that has to be fixed is respecting the sorted
option properly...
@JanWielemaker I think, I'm done with this improvement. I got your point about generating subjects, and, now, if graph
option is specified, the search scope is limited to that graph, which significantly improves performance in case when RDF store is big. Also, the sorted
option is respected in both cases (with and without specified graph
option).
Thanks. It isn't really the whole story. If the dabase is large and the graph to be saved is a large portion
thereof, the old enumeration avoids running out of memory. I've done a bit more refactoring that I pushed to a branch called tmp
for you to review. See https://github.com/SWI-Prolog/packages-semweb/commit/12efbdd855701bb0e79a0a20949e52583215ad24
I have not yet merged it into master mainly for author reasons. I merged it all into one commit to make it less confusing and this commit thus contains work from you and me. I'm happy to let you be the author as you started doing the hard work. If you are happy too, I'll merge to master. If you do not want to be associated with my edits, I propose I'll change the authorship and acknowledge you in the commit message. Please let me know.
@JanWielemaker I have one comment in the code (I think we should reverse the inequality there). Otherwise, I'm ok to be the author.
Thanks for spotting! Updated and merged with master. I close this.
I don't see the reason why we need to check if
Subject
is subject usingrdf_subject(Subject)
globally, when the next check does the same thing but within given graph boundaries. Serializing even small graphs to XML when the size of DB is big becomes very slow here.