Timing - Githubissues

emrysshevek commented 5 years ago

While adding tests for the timing feature, I noticed that our timing is potentially calculated incorrectly.

Each of our metafeatures has branching dependencies, and many of those dependecies in turn share dependencies. For example, XPreprocessed requires XSample, XSampledColumns, and column_types. XSample in turn is dependent on XSampledColumns, Y, sample_shape, and seed_base and so on. This means that the time it took to compute XSampledColumns will be added to the final compute time of XPreprocessed twice as if it was recomputed each time a dependency required it.

While this is certainly a valid way to measure the time, it doesn't seem to be exactly what we want (which would be to include the compute time for XSampledColumns only once). It also means that the compute time our package returns for a single metafeature is actually longer than it takes to compute that single metafeature.

So my questions are:

What is the expected use for our timing feature?
Is it worth it to put in the effort to fix it by adding some sort of stack/caching mechanism to our current method?
Is it fine the way it is?
Should we just get rid of it?

bjschoenfeld commented 5 years ago

Timing individual metafeatures could theoretically allow us to analyze a trade off between usefulness of a metafeature and the time it takes to compute it. I personally just discard those times and that's also what happens within D3M. Since it is wrong, perhaps we should just ignore them for now. Once it matters to someone, then we can worry about fixing or removing it. For now, we could add a comment in the code and the README indicating this bug. What do you think?

emrysshevek commented 4 years ago

That sounds fine to me, I looked into a quick fix but it seems like it will be a bit more difficult to ensure that each dependency is in there exactly once.

byu-dml / metalearn

Timing #202