dgarijo / FragFlow

Project desinged for detecting common fragments in scientific workflows by reusing several existing graph mining techniques
2 stars 1 forks source link

Memory management with high number of Fragments #89

Open dgarijo opened 10 years ago

dgarijo commented 10 years ago

Pafi and Parsemis produce sometimes too many fragments which should be filtered. The filtering can be done while reading the result catalog: if a fragment has the same amount of nodes as another fragment, do not store it. If the amount of nodes is the same but the edges is higher, keep the one with most edges. This would make reading fragments slower, but the amount of memory would be less (java heap problems).

Another possibility is to just store the nodes in a database (e.g., repository) while being read. Then do all the additional filtering and export just the smaller dataset.