Closed jkobject closed 4 years ago
The slowdown is likely due to Hound provenance records being populated. Since you have observed all cores being used, it is likely that Hound is switching to parallel background updates, which only block if you try to exit python before the records finish being written.
If you do not need Hound records (logs of each individual change to workspace entities) you can call WorkspaceManager.disable_hound()
which will skip record population. Assuming that Hound is responsible for the slowdown, this should return to previous performance.
If you do wish to keep the provenance records, then there is no way around the slowdown. However, the batch uploads are highly efficient and run in the background, allowing your script to accomplish other tasks while the records are uploaded.
dm v0.0.17:
Reuploading samples after a quick change is doing a parallel upload over all of my cpus.
It takes around 30mn tu complete. the dataframe is not that big: 1500x200.
Is there any way to speed up this process as it use to be?
Best,