Pafi and Parsemis produce sometimes too many fragments which should be filtered. The filtering can be done while reading the result catalog: if a fragment has the same amount of nodes as another fragment, do not store it.
If the amount of nodes is the same but the edges is higher, keep the one with most edges. This would make reading fragments slower, but the amount of memory would be less (java heap problems).
Another possibility is to just store the nodes in a database (e.g., repository) while being read. Then do all the additional filtering and export just the smaller dataset.
Pafi and Parsemis produce sometimes too many fragments which should be filtered. The filtering can be done while reading the result catalog: if a fragment has the same amount of nodes as another fragment, do not store it. If the amount of nodes is the same but the edges is higher, keep the one with most edges. This would make reading fragments slower, but the amount of memory would be less (java heap problems).
Another possibility is to just store the nodes in a database (e.g., repository) while being read. Then do all the additional filtering and export just the smaller dataset.