Answer to RQ2: Calculate Frequency Using Metrics

akondrahman / IaCTesting

Placeholder for the research study related to IaC testing anti-patterns

3 stars 256 forks source link

Answer to RQ2: Calculate Frequency Using Metrics #16

Closed akondrahman closed 3 years ago

akondrahman commented 4 years ago

Completed

Raw count from tool

TODO

Proportion of files:

count this metric per repository: equation:

no. of files with at least one occurrence of anti-pattern of category x / total Python files used for testing in repo.

Anti-pattern count per file type:

count this metric per repository: equation: no. of file of type y that include at least one occurrence of anti-pattern category x / total files of type y used for testing in the repository .

Once you get these values generate: min, max, median, and boxplot for each anti-pattern category x and for the three datasets: Openstack, GitHub, and Gitlab

Estimate Anti-pattern Lifetime

How long will it take? If you need more than 1 week, then we will not attempt it.

Talismanic commented 4 years ago

equation:

no. of files with at least one occurrence of anti-pattern of category x / total Python files used for testing in repo

Bhaiya, I assumed in the denominator you meant all the test files (python+yaml). I have calculated this. Summary in 75% projects the ratio is 1. This indicates at least 1 antipattern is there for in each file. In most of the cases, that anti-pattern is either not cleaning the environment or not using remote testing. percentage_of_files_where_antipattern_exist.zip

Talismanic commented 4 years ago

Anti-pattern count per file type:

count this metric per repository: equation: no. of file of type y that include at least one occurrence of anti-pattern category x / total files of type y used for testing in the repository

Bhaiya, I need a bit of clarification on this. What I am going to do is I will generate 10 metrices per project (2 types of file X 5 anitpatterns). Then divide each metric by the corresponding file count.

Is my approach correct?

akondrahman commented 4 years ago

You are right.

Talismanic commented 3 years ago

You are right.

Here is the summary Bhaiya. Acronyms

	py_Assertion_Roulette	py_External_Dependency	py_No_Environment_Cleanup	yaml_SkipAnsible_Lint	yaml_Local_Only_Test	yaml_Assertion_Roulette	yaml_External_Dependency	yaml_No_Environment_Cleanup
Average	1.544095	0.737913	0.522896	0.012376	0.043131	0.000291	0.016523	0.693946
Min	0	0	0	0	0	0	0	0
Max	25	35.6	1	1	1	0.033333	1.105263	1
Median	0.142857	0	0.6875	0	0	0	0	1

Talismanic commented 3 years ago

Estimate Anti-pattern Lifetime

The problem I discovered with TAMI is, till now we are unaware about the code block or line no of the file where the anti-pattern is detected. I was checking this code base.This may be something achievable:

Find the files where antipattern existing.
Find the creation date of the file
Find all commits where this file has been changed
Check the temporal nature of the antipatterns (increasing or decreasing or remaining same)

This can take some time. But I am not sure whether this will serve the purpose. Kindly let me know your thiughts @akondrahman Bhaiya.

akondrahman commented 3 years ago

You are right @Talismanic . Your methodology to detect lifetime will not work. Let us abandon the idea of lifetime for now. About the results ... what do each cell mean? Is it proportion or anti-pattern count?

Talismanic commented 3 years ago

About the results ... what do each cell mean? Is it proportion or anti-pattern count?

Each cell is the average, Min, Max, Median of the proportions of antipattern_count in all Y types file and count of Y type file. Raw Data is also attached.

file_type_wise_antpatterns.zip

akondrahman commented 3 years ago

Thanks for the update @Talismanic. Sounds good. If you are done with the frequency analysis, then start writing the paper. You already have access to the Overleaf document. Once 80% of the writing is done we will submit some bug reports.