Open Makman2 opened 7 years ago
(Participating)
Hey, I would like to participate in establishing this bear.
I had the following idea in mind:
Tell me if there are any mistakes or any ways to improve it. And what does coAST means?
Firstly we would be using a LocalBear as analyzing multiple files at same time should not make any difference.
It does. Actually we have to analyze each function or more precise, each conditional expression. They may use performance information from functions not defined in the same module, maybe even not inside the same package.
and evaluate the time taken by each operand
Looks like you want to run the snippets. That is not always a good idea, there's code that shall not run when the overall program doesn't run. Like file_io_result = delete_file() or retry_delete()
(Actually we would preprocess the operands first else we would have to run program again for every conditional operator)
How do you want to preprocess them?
For finding the best way we most probably won't be needing an algo assuming that in real life we write short conditional operations, else we can always have a nice algo.
I mean we need some kind of functionality for that, don't we? :D
And what does coAST means?
https://github.com/coala/coAST. We plan to implement a generic AST parser for all languages. So it's in principle an all-language (and maybe feature) replacement for pyparse. We have a gsoc project for that, I hope it gets accepted and we can finally start working on coAST, as this idea is around for quite a while^^
I was thinking of taking the operand and evaluating them first and then use their result for the second time in actual conditional place. So, what i wanted to preprocess was to first go through every file and parse out the operands from every conditional operator. This way we could get the time by not disturbing the flow of program. But sometimes we use short circuiting to our advantage, we would need to handle that carefully.
As for the fifth point we can make a nested boolean tree and select the best combination while climbing up the tree.
And yes, I agree using LocalBear would be stupidity. We need some another bear that flows with the program. So for input distribution we could analyze the time by running it only once.
As for the fifth point we can make a nested boolean tree and select the best combination while climbing up the tree.
How do you propose to do that ?
@Makman2 We can easily retrieve the conditional statements and methods used in them using AST. So, if I have code like
if a() and b():
pass
if b() or a():
pass
You are expecting users to input the cost of both functions, and the probability that they will short circuit ? then we can evaluate each arrangement.
Also, there are several other issues of optimization using AST, Like
[x*2 for x in "abc" if False] => {}
[x*2 for x in []] => {}
that's a different case though ;)
You are expecting users to input the cost of both functions
Don't know, ideally we don't require that and calculate it ourselves.
@Makman2 a suggestion maybe. Checking for the conditional stmts, running the functions with these conditional stmts, recording time and piping it out to a file (as sometime the log will be large). Interchanging the positions and piping it out to the same file after a line. Then printing the file like for each function and calculating it's average. It would output the log as below: (this is for a single condition)
For stmt in line <line number>, current:
True and True -> 3 seconds
True and False -> 3 seconds
False and True -> 2 seconds
False and False -> 2 seconds
Average: 2.5 seconds
After interchanging the operands:
True and True -> 3 seconds
True and False -> 3 seconds
False and True -> 1 second
False and False -> 1 second
Average: 2 seconds
Or pipe out only the average:
For stmt in line <line number>, current average: 2.5s
After interchanging operands it becomes: 2.0s
After the file's changed its reverted back. To revert the file we can either make use of git revert and commit the change or make a temp for it and delete the file. Piping the avg seems best! Your guesses?
piping it out to a file
why do you want to pipe it to a file? We can directly use the values returned by timeit
or similar modules.
running the functions with these conditional stmts
That's what I want to avoid, as stated from me:
and evaluate the time taken by each operand
Looks like you want to run the snippets. That is not always a good idea, there's code that shall not run when the overall program doesn't run. Like
file_io_result = delete_file() or retry_delete()
Another note:
Interchanging the positions and piping it out to the same file after a line.
We just have to execute each condition once :) If we know the execution times of each single condition, we can optimize the execution time without running them again (neglecting side effects).
We should depend on tests to execute the condition. Then we would not need to interchange the position.
We should depend on tests to execute the condition.
Not a bad idea. Though then the performance relies then on test conditions, we need to take care of that somehow^^
Then we would not need to interchange the position.
Don't understand that, the whole issue is about doing so :)
Languages like python use short-circuiting, meaning condition evaluations do stop as soon as the result of further conditions is not necessary:
See https://www.python.org/dev/peps/pep-0308/
Some evaluations are more costly than others, so an optimized ordering of expression could yield better overall performance.
So we get a discrete optimization problem to solve. Easiest is to try out every combination, in this case there are only two. Let's try the other one:
Nice, this ordering yields a better overall performance.
So depending on the distribution of the input values (in this case we assumed they are distributed in a way that lets the function return in 50% of the cases False and in the other 50% True), we get different overall performances.
If a user wants to run this analysis, he has just to:
The bear now
This is in fact a transformation from code statistics into a stochastic optimization problem.
This definitely requires coAST, and I know this is quite fancy, still I wanted to write down that idea :+1: