Closed akondrahman closed 3 years ago
equation:
no. of files with at least one occurrence of anti-pattern of category x / total Python files used for testing in repo
Bhaiya, I assumed in the denominator you meant all the test files (python+yaml). I have calculated this. Summary in 75% projects the ratio is 1. This indicates at least 1 antipattern is there for in each file. In most of the cases, that anti-pattern is either not cleaning the environment or not using remote testing. percentage_of_files_where_antipattern_exist.zip
Anti-pattern count per file type:
count this metric per repository: equation: no. of file of type y that include at least one occurrence of anti-pattern category x / total files of type y used for testing in the repository
Bhaiya, I need a bit of clarification on this. What I am going to do is I will generate 10 metrices per project (2 types of file X 5 anitpatterns). Then divide each metric by the corresponding file count.
Is my approach correct?
You are right.
You are right.
Here is the summary Bhaiya. Acronyms
py_Assertion_Roulette | py_External_Dependency | py_No_Environment_Cleanup | yaml_SkipAnsible_Lint | yaml_Local_Only_Test | yaml_Assertion_Roulette | yaml_External_Dependency | yaml_No_Environment_Cleanup | |
---|---|---|---|---|---|---|---|---|
Average | 1.544095 | 0.737913 | 0.522896 | 0.012376 | 0.043131 | 0.000291 | 0.016523 | 0.693946 |
Min | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Max | 25 | 35.6 | 1 | 1 | 1 | 0.033333 | 1.105263 | 1 |
Median | 0.142857 | 0 | 0.6875 | 0 | 0 | 0 | 0 | 1 |
Estimate Anti-pattern Lifetime
The problem I discovered with TAMI is, till now we are unaware about the code block or line no of the file where the anti-pattern is detected. I was checking this code base.This may be something achievable:
This can take some time. But I am not sure whether this will serve the purpose. Kindly let me know your thiughts @akondrahman Bhaiya.
You are right @Talismanic . Your methodology to detect lifetime will not work. Let us abandon the idea of lifetime for now. About the results ... what do each cell mean? Is it proportion or anti-pattern count?
About the results ... what do each cell mean? Is it proportion or anti-pattern count?
Each cell is the average, Min, Max, Median of the proportions of antipattern_count in all Y types file and count of Y type file. Raw Data is also attached.
Thanks for the update @Talismanic. Sounds good. If you are done with the frequency analysis, then start writing the paper. You already have access to the Overleaf document. Once 80% of the writing is done we will submit some bug reports.
Completed
Raw count from tool
TODO
Proportion of files:
count this metric per repository: equation:
no. of files with at least one occurrence of anti-pattern of category x / total Python files used for testing in repo.
Anti-pattern count per file type:
count this metric per repository: equation: no. of file of type y that include at least one occurrence of anti-pattern category x / total files of type y used for testing in the repository .
Once you get these values generate: min, max, median, and boxplot for each anti-pattern category x and for the three datasets: Openstack, GitHub, and Gitlab
Estimate Anti-pattern Lifetime
How long will it take? If you need more than 1 week, then we will not attempt it.