finos / datahub

DataHub - Synthetic data library
https://datahub.finos.org
Apache License 2.0
80 stars 13 forks source link

Generation running slowly #8

Closed grovesy closed 4 years ago

grovesy commented 4 years ago

Bug Report

Steps to Reproduce:

  1. Run the tests -
  2. They take several mins

Expected Result:

These tests should run in second

Actual Result:

They take several mins -

Additional Context:

I suspect the issue is with name generation, maybe something which was being cached is no longer - i.e. are we rebuilding the markov-tress each time

Looking at the metrics output, name generation seems to be taking 300ms for each time

grovesy commented 4 years ago

Found the issue, in a tidy up to remove global state, imitating of loading datasets was moved back into the attribute generator constructors, far cleaning like that

but the dataframe generator initiated a new instance of an attribute generator for each result generated - some of the prep work for an attribute generator could take several seconds

a "context" class has been added which now keeps hold of attribute generators and can be reused.