SWI-Prolog / packages-semweb

The SWI-Prolog RDF store
28 stars 14 forks source link

Removed redundant check causing severe performance penalty on big graphs #72

Closed likelion closed 6 years ago

likelion commented 6 years ago

I don't see the reason why we need to check if Subject is subject using rdf_subject(Subject) globally, when the next check does the same thing but within given graph boundaries. Serializing even small graphs to XML when the size of DB is big becomes very slow here.

JanWielemaker commented 6 years ago

The rdf_subject/1 is needed to generate subjects. It cannot be delated. We can do

(   var(Subject)
->  rdf_subject(Subject)
;   true
).

I'm a bit surprised this being so slow. Do you have timings?

likelion commented 6 years ago

Yes, here is a result of profile(rdf_save(stream(Out), [graph(Graph)])) when serializing a graph with 91 triples. It definitely should not take 1.28s to perform such operation, you can see ~200K triples (from other graphs) are being checked.

screen shot 2017-12-06 at 12 57 03 screen shot 2017-12-06 at 12 57 31

likelion commented 6 years ago

One thing that has to be fixed is respecting the sorted option properly...

likelion commented 6 years ago

@JanWielemaker I think, I'm done with this improvement. I got your point about generating subjects, and, now, if graph option is specified, the search scope is limited to that graph, which significantly improves performance in case when RDF store is big. Also, the sorted option is respected in both cases (with and without specified graph option).

JanWielemaker commented 6 years ago

Thanks. It isn't really the whole story. If the dabase is large and the graph to be saved is a large portion thereof, the old enumeration avoids running out of memory. I've done a bit more refactoring that I pushed to a branch called tmp for you to review. See https://github.com/SWI-Prolog/packages-semweb/commit/12efbdd855701bb0e79a0a20949e52583215ad24

I have not yet merged it into master mainly for author reasons. I merged it all into one commit to make it less confusing and this commit thus contains work from you and me. I'm happy to let you be the author as you started doing the hard work. If you are happy too, I'll merge to master. If you do not want to be associated with my edits, I propose I'll change the authorship and acknowledge you in the commit message. Please let me know.

likelion commented 6 years ago

@JanWielemaker I have one comment in the code (I think we should reverse the inequality there). Otherwise, I'm ok to be the author.

JanWielemaker commented 6 years ago

Thanks for spotting! Updated and merged with master. I close this.