Closed Talismanic closed 3 years ago
@akondrahman Bhaiya,
For 1, what count we need? Are you referring to the Table 7,8,9,10 ?
No only Table 9. The count of anti-patterns for each category.
Before you give me the category, TAMI needs to be adjusted for environment cleanup
. Someone can clean up by using a dedicated task or role. So we need to check the tag task
and role
and see if the keyword clean or teardown appears. For example as down in this blog post: https://janikvonrotz.ch/2018/02/26/working-with-ansible-cleanup-tasks/
Let me know if you have questions @Talismanic
So we need to check the tag
task
androle
and see if the keyword clean or teardown appears.
Implementing this with priority.
@akondrahman Bhaiya, changes have been accomodated. This change now actually demands to run TAMI on all the dataset. So I am going to run TAMI in the batch runner mode and update the counts. This may take until tomorrow.
Thanks @Talismanic for all the hard work. Send me the CSVs when all results are ready. In the meantime can you send me a ZIP file with all YAML test scripts for the three datasets ?
In the meantime can you send me a ZIP file with all YAML test scripts for the three datasets ?
Bhaiya, do you need those files in any structured way or only the dump of all files will do?
I want it like this: ZIP |- Openstack |------subdir1 |----------subdir1/subsubdir1 |- GitHub |------subdir1 |----------subdir1/subsubdir1 |- GitLab |------subdir1 |----------subdir1/subsubdir1
Please preserve the structure and path so that I can map it to the CSV results file . I want the raw YAML files to gain further empirical insights, if any
@Talismanic
I have handled this ... needed some tweak in writing ... so no need to work on the following:
The examples that I asked before: a. I want a lot of setup and no cleanup.
@Talismanic
So far I have finished writing Background and Related Work, RQ1 ... I need the following data that I asked for to proceed further:
The raw count from TAMI for the three datasets: Openstack, GitHub, and GitLab.
Only YAML scripts from the three datasets: Openstack, GitHub, and GitLab.
Can I expect them in the next 12 hours or so?
@Talismanic
So far I have finished writing Background and Related Work, RQ1 ... I need the following data that I asked for to proceed further:
The raw count from TAMI for the three datasets: Openstack, GitHub, and GitLab.
Only YAML scripts from the three datasets: Openstack, GitHub, and GitLab.
Can I expect them in the next 12 hours or so?
I am working on these bhaiya. As I could not store any repo locally, overall thing is taking some time. But hopefully it will be finished by next 12 hours.
Dear @akondrahman Bhai, There are some new situations after I excluded the python files and added the new logic in the No Env Clean Up. The summary is, our data count has been significantly lowered. This is the cumulative data for GitLab and GitHub. I did not yet finish to separate those. Doesn't this number look very small?
anti-pattern name | project count | file count | total count |
---|---|---|---|
Skip Ansible Lint | 6 | 6 | 22 |
Local Only Test | 25 | 35 | 40 |
Assertion Roultette | 2 | 2 | 2 |
External Dependency | 8 | 26 | 45 |
No Env Clean Up | 166 | 2214 | 2214 |
For your comparison, the previous count was: | anti-pattern name | project count | file count | total count |
---|---|---|---|---|
Skip Ansible Lint | 6 | 6 | 22 | |
Local Only Test | 25 | 35 | 40 | |
Assertion Roultette | 123 | 4461 | 38629 | |
External Dependency | 92 | 1501 | 7763 | |
No Env Clean Up | 229 | 9784 | 9784 |
@akondrahman Bhaiya, Total Count for each category of anti-patterns.
- The raw count from TAMI for the three datasets: Openstack, GitHub, and GitLab.
Antipattern Name | Github | Gitlab | Openstack |
---|---|---|---|
Skip Ansible Lint | 22 | 0 | 18 |
Local Only Test | 37 | 3 | 16 |
Assertion Roultette | 2 | 0 | 3 |
External Dependency | 45 | 0 | 18 |
No Env Clean Up | 2164 | 50 | 96 |
I have one observation here. Many of the Openstack repo is also available in Github Repo set.
I have one observation here. Many of the Openstack repo is also available in Github Repo set.
To handle this do not include the Openstack repos in the GitHun repo set. So no Openstack data in GitHub data.
As discussed in issue #18 , you need to redo the analysis for GitLab as you will be collecting more test.yml
files.
Doesn't this number look very small?
Don't worry about the numbers now. Our job as researchers is to report accurate scientific results. We should not do anything to make results look good
.
@Talismanic ... when can I get the stuff that I needed? Today is Christmas day and my whole day is open to work on your paper :)
@Talismanic ... when can I get the stuff that I needed? Today is Christmas day and my whole day is open to work on your paper :)
Bhaiya, I could not automate the whole process of cleaning other files and keeping the structure same as the original. So I am cherrypicking the repositories. Till now I could complete clearing 59 repositories out of 166. I am attaching those here. I am working rigorously to get the rest done as early as possible.
Thanks for the update. If it is easier on you you can give me all repos without filtering and I can do the filtering myself.
Thanks for the update. If it is easier on you you can give me all repos without filtering and I can do the filtering myself.
Bhaiya, I have some db setup and some dirty PowerShell scripts to clean up. For you, it will be a little bit troublesome to start from scratch. Please allow me some time. I will finish it inshallah. Also, when I cross the landmark of 100, I will share one more zip with you.
OK. I will wait. Thanks for all the hard work.
Also, when I cross the landmark of 100, I will share one more zip with you.
Send me by dataset: first Openstack, then GitLab, and then GitHub if possible. I also do not have the full anti-pattern count dataset for GitHub, GitLab, and Openstack.
@akondrahman Bhai, Unfortunately, I started with Github & Gitlab first. Dataset for Gitlab & Github is ready. I uploaded those in the below link.
https://drive.google.com/file/d/1QYKnLVzRV-taTm6k3PgQjKcNnV1t1PZ3/view?usp=sharing
I am working on OpenStack..
Also, the full anti-pattern count dataset is attached here. There are two files, one for github+gitlab and another is for openstack. antipatterns.zip
@akondrahman Bhaiya, I have made a mistaked. While running TAMI on openstack data, I was in a branch where python codes were not excluded. So the data is erroneous. I have rectified and here is the updated openstack anti-pattern data. I have also update counts the above comment.
Thanks @Talismanic !
Two issues:
akond.rahman.buet@gmail.com
?
- For the GitLab output, how do I separate GitHub and GitLab output? Using repo_type =2 ?
GitHub Part 1 Github-1.zip
Github Part 2 Github-2.zip
Github Part 3 Github-3.zip
Gitlab Gitlab.zip
Thanks a whole bunch ... I think I can start writing RQ2 of the paper.
Open Stack repos: open-stack-new-repos.zip
@akondrahman Bhai,
Now I still have 2 action points:
I will start working on these tomorrow.
@akondrahman Bhai, I need some help for the below queries:
Before attempting 2 you need to address issue #19 ... this will change the number of anti-pattern count. Will you be done with 1 in the next 2-3 hours, @Talismanic ?
@Talismanic
I will have good amount of time till Dec 31 to work on your paper. If you can send me the data that I requested in the next 24 hours then that would allow me to finish off the writing for RQ2, RQ3, and Discussion. After Jan 01 I will be busy with other papers and university activities.
Before attempting 2 you need to address issue #19 ... this will change the number of anti-pattern count. Will you be done with 1 in the next 2-3 hours, @Talismanic ?
Sorry bhaiya, I was not available last night. The work can be done within 2-3 hours.
Point 2: WIP (mining on going)
Metric | Github | Gitlab | Openstack |
---|---|---|---|
commit count | 700 k | 8.2 k | 258 k |
test-related commit count | 276 k | 6 k | 43.6k |
total Ansible scripts | 66.4 k | 2 k | 11.2 k |
total test scripts | 5.2 k | 52 | 511 |
avg duration of all repos in month | 43 | 12 | 75 |
@akondrahman Bhaiya, For point 2 I am facing a dilemma to count the test_related_commits
Which option should I follow?
I think it is better to ignore test-related commit count
Just calculate Ansible-related
commit.
Just calculate
Ansible-related
commit.
Ok. For that I think approach 1 (counting yml) is sufficient. Scripts are running to extract that Bhaiya. Estimated time of completion is around 8 hours . :(
OK ... I will wait. In the mean time, if you can update the cleanup algorithm in TAMI. Give me the new CSVs for the three datasets when ready.
Once the results are ready, let me know @Talismanic
@akondrahman Bhai, Updated raw count from TAMI for the three datasets: Openstack, GitHub, and GitLab.
Antipattern Name | Github | Gitlab | Openstack |
---|---|---|---|
Skip Ansible Lint | 3 | 0 | 19 |
Local Only Test | 19 | 3 | 18 |
Assertion Roultette | 1 | 0 | 1 |
External Dependency | 25 | 0 | 20 |
No Env Clean Up | 42 | 5 | 9 |
Attaching the raw count file. repo_type=3 means openstack.
Calculating more data Bhaiya.
@akondrahman bhai, Rest of the metrics:
Metric | Github | Gitlab | Openstack |
---|---|---|---|
Total Repos | 324 | 91 | 54 |
Total Projects | 347 | 92 | 49 |
commit count | 700696 | 8219 | 258523 |
ansible-related commit count | 276104 | 6090 | 43649 |
total Ansible scripts | 66400 | 2065 | 11233 |
total test scripts | 5198 | 52 | 511 |
avg duration of all repos in month | 43 | 12 | 75 |
@akondrahman Bhai, As I have considered all the Openstack repo out of Github, table 7 data will be updated. Updates for that will be:
Data for Table 7:
Type | Openstack | Github | Gitlab |
---|---|---|---|
Initial Count | 1253 | 3405k | NA |
Criteria-1 (Ansible Script) | 96 | 6633 | 8194 |
Criteria-2 (Not a fork) | 96 | 4147 | 7512 |
Criteria-3 (Contributor Count3) | 94 | 856 | 546 |
Criteria-4 (Commits/Month >=2) | 90 | 770 | 332 |
Criteria-5 (Lifetime>1month) | 90 | 675 | 279 |
Criteria-6 (10% iac script) | 54 | 325 | 91 |
@akondrahman Bhai, Small count of External Dependency has surprised me a bit as I saw many external dependencies when I was sorting the yml files for you manually. I reviewed the code and found that I had made a mistake while detecting URLs in the test scripts. After fixing that, I am seeing a soaring increase of this anti-pattern. Revised count will be:
Antipattern Name | Github | Gitlab | Openstack |
---|---|---|---|
Skip Ansible Lint | 3 | 0 | 19 |
Local Only Test | 19 | 3 | 18 |
Assertion Roultette | 1 | 0 | 1 |
External Dependency | 765 | 8 | 125 |
No Env Clean Up | 42 | 5 | 9 |
Attaching the raw count: iac_anti_patterns.zip
I sincerely apologize for this kind of mistakes. I am also going to review other methods whether there is any logic level mistake still present.
@akondrahman Bhaiya, I am done with code review and logic checking. Also I implemented check for the codebases where explicit roles are not used in the scripts. After those, I found Local Only Test and Assertion Roulette count has increased significantly.
This is expected as the coverage of TAMI increased after handling the role-less scripts.
Antipattern Name | Github | Gitlab | Openstack |
---|---|---|---|
Skip Ansible Lint | 3 | 0 | 19 |
Local Only Test | 245 | 30 | 19 |
Assertion Roultette | 527 | 3 | 1 |
External Dependency | 757 | 8 | 133 |
No Env Clean Up | 42 | 5 | 9 |
Updated raw count.
I think I am done with the counting and data.
Thanks for the hard work. I will plugin the results.
@Talismanic
I need the Openstack and GitHub YAML ZIP again. Seems like you have added more repos. I am expecting a ZIP file of 495 scripts for Openstack and 4942 scripts for GitHub, preserving the whole directory structure. Without this I can't plugin the smell density values and count per play values. Here is the structure:
ZIP |- Openstack |------subdir1 |----------subdir1/subsubdir1
Just completing the RQ2 is taking 1 week! Hope this loop will close soon.
@akondrahman bhai, Rest of the metrics:
Metric Github Gitlab Openstack Total Repos 324 91 54 Total Projects 347 92 49 commit count 700 k 8.2 k 258 k ansible-related commit count 276 k 6 k 43.6k total Ansible scripts 66.4 k 2 k 11.2 k total test scripts 5.2 k 52 511 avg duration of all repos in month 43 12 75
@Talismanic I need full and accurate number here: 8.2 K , 6K will not work. Please update the table with full values not abbreviations.
@Talismanic I need full and accurate number here: 8.2 K , 6K will not work. Please update the table with full values not abbreviations.
Done Bhaiya.
Thanks @Talismanic . I will wait on the YAML files ... I need the YAML files to calculate the anti-pattern density metric and the count per play metric. When will the YAML files be ready? All you need to do is dump all YAML scripts by maintaining the directory structure, is that right?