dwslab / jRDF2Vec

A high-performance Java Implementation of RDF2Vec
MIT License
39 stars 5 forks source link

Generate all walks of depth x #99

Open moritzblum opened 2 years ago

moritzblum commented 2 years ago

Hi, is there a parameter to sample all walks of a certain depth, e.g., as done in the original RDF2vec paper for DBpedia and Wikidata with depth 2 following direct outgoing relations? I think numberOfWalks is not able to specify this setting.

janothan commented 2 years ago

Hi, this feature is currently not implemented. From earlier experience: This easily gets very expensive on larger graphs.

To approximate this, you theoretically could set numberOfWalks to a very high number and combine this with a DUPLICATE_FREE walkGenerationMode -- however, this also scales very badly.

The correct way would be to extend the framework for this capability. jRDF2vec is extensible enough to add walk generation flavors. The process is roughly documented here -- albeit very incompletely. Adding the feature you described is not on the near-term roadmap. However, feel free to fork and extend this repo and to eventually create a pull request.

One last remark: Note that the semantics of depth vary in publications. Sometimes depth refers to one "triple hop" (i.e., S-P-O has depth=1) as in this framework; sometimes it refers to the number of element hops (i.e., S-P-O has depth=2).

moritzblum commented 2 years ago

Thanks for the quick reply. I assume the depth in the original RDF2vec refers to "element hop" and, e.g., in your RDF2vec Lite refers to "triple hop"? That has actually always confused me a bit and was not clear to me, thanks for the hint.