Prometheus-Extractor / prometheus

Relationship extractor for facts running on Spark
4 stars 0 forks source link

Performance: Crash when relation.entities is too big #6

Closed ErikGartner closed 7 years ago

ErikGartner commented 7 years ago

In TrainingDataExtractor when we collect all relations.

val broadcastedRelations = relations.sparkContext.broadcast(relations.collect())
axeltlarsson commented 7 years ago

Performance scaling issues

axeltlarsson commented 7 years ago

Tips from Marcus:

ErikGartner commented 7 years ago

Extracting TrainingSentences is a slow process. About 20 min for 2 relations, though it shouldn't necessarily scale linearly with the number of relations.