Open izgzhen opened 5 years ago
@Calvin-L @izgzhen I have several questions:
stats
structure would look like? Can it be a class with fields for all the data it stores? Can it just be a dictionary created by the improve_implementation
method?stats
structure handle threading itself or should improve_implementation
handle threading. More specifically, since we are updating it in the ImproveQueryJob
for-loop, are they sharing the same stats
structure? Or are we somehow making one for each, then aggregating them into one in improve_implementation
? Or maybe there's a better way than both of the ones I said.improve
loop iteration (the "forever" while-loop) if we want to preserve its responsibility and not have an out-parameter?improve
itself also creates some difficulties in implementing improvement number limit.Ultimately the design is up to you, but:
What do you think the
stats
structure would look like?
It doesn't really matter, as long as it is documented. I would probably do something lightweight, like "a dictionary whose keys are method names and whose values are lists of the timestamps of when improvements were made".
since we are updating it in the ImproveQueryJob for-loop, are they sharing the same
stats
structure?
That would be an example of shared mutable state, so I'm opposed to it. It would be much better for each job to collect its own local statistics and to aggregate them at the end. If you use the dictionary I proposed, each job would store a list and the aggregation would just arrange them into a dictionary by job name.
How would we go about getting the difference in time between each
improve
loop iteration
This can be done by the loop in ImproveQueryJob
.
How comfortable are you with generators (sometimes called "coroutines") in Python?
def generate():
i = 0
while i < 100:
print("yielding...")
yield i
i += 1
for value in generate():
print("got {}".format(value))
In the example, the generator generate()
and the loop interleave: instead of printing 100 "yielding..." lines followed by 100 "got X" lines, the program alternates between "yielding..." and "got X".
The improve
loop is a generator, so the time difference between yields can be easily calculated by the caller. Instead of printing "got X", compute the current time and subtract it from the previous time.
You may want to alter improve
so that it yields once on every loop iteration, even if an improvement was not found. (In the other thread I wrote up a more detailed description of this idea.)
not messing with improve itself also creates some difficulties in implementing improvement number limit
Actually it doesn't! The reason you were messing with improve
was to make it check the improvement number and stop when it reached a threshold. However improve
already comes with a super-general stopping mechanism: the stop_callback
parameter. All you should need to do is modify the stop callback passed by the ImproveQueryJob
.
If you intend the limit to be global rather than per-job, then instead you should modify the top-level loop that collects results to call stop_jobs(...)
and return when a threshold is reached.
discussion started in https://github.com/CozySynthesizer/cozy/pull/93