Closed valeriocos closed 4 years ago
https://github.com/chaoss/grimoirelab/issues/285#issuecomment-596135277
@snack0verflow, sorry it seems there is a problem with the meaning of the enriched field time_to_merge_request_response
. The description at https://github.com/chaoss/grimoirelab-elk/blob/master/schema/github_pull_requests.csv says that it is the Time to merge a Pull Request in days
, however looking at the code (as you pointed out) the meaning is basically the time to first attention to the pull request (which is the diff between the creation date and the first comment of someone that didn't submit the PR). If the explanation makes sense, could you please submit a PR to fix the description in the schema?
The time to merge you refer to should be the code_merge_duration
(https://github.com/chaoss/grimoirelab-elk/blob/13982d4024ffcaf2d0e42a26194ec4725619fed8/grimoire_elk/enriched/github.py#L507) or time_to_close_days
(https://github.com/chaoss/grimoirelab-elk/blob/13982d4024ffcaf2d0e42a26194ec4725619fed8/grimoire_elk/enriched/github.py#L439).
Please have a look at the schema about the github pull request data (https://github.com/chaoss/grimoirelab-elk/blob/master/schema/github_pull_requests.csv). There, you will find a description for each enriched field. If you find some imprecisions, please report them and we can evaluate them together. Thanks!
https://github.com/chaoss/grimoirelab/issues/285#issuecomment-596135277
@snack0verflow thank you for pointing this out!
The tests in travis (e.g., https://github.com/chaoss/grimoirelab-elk/blob/master/.travis.yml#L31) are executed only on https://github.com/chaoss/grimoirelab-elk/tree/master/grimoire_elk (e.g., https://coveralls.io/builds/29180665). On the other hand, the documentation at https://github.com/chaoss/grimoirelab-elk#running-tests should do the same but neither the param
--source=grimoire_elk
or --include=grimoire_elk
limit the info in the file .coverage
. I tried with --source=*grimoire_elk*
and --include=*grimoire_elk*
and it doesn't work either (feel free to investigate more). Nevertheless, running report -m --include=*grimoire_elk*
produces only the info for the grimoire_elk package.
It would be great if you can submit a PR to update the doc at https://github.com/chaoss/grimoirelab-elk#running-tests ? The changes in the PR could be the ones below, however feel free to propose other ones.
python3 -m coverage run run_tests.py --source=grimoire_elk
in the readme can be converted to python3 -m coverage run run_tests.py
(the image should be updated accordingly).python3 -m coverage report -m
could stay as it is. However a note could be added to mention that the param --include=*grimoire_elk*
allows to report on the test coverage for the code inside the grimoire_elk package.WDYT?
Thanks
@valeriocos Sorry for the late reply, but what do you mean by
(the image should be updated accordingly).
Thanks EDIT: Oh you mean the screenshot
No worries,
EDIT: Oh you mean the screenshot
Sorry for not being precise, yes :)
Respected Mentors @valeriocos, @Polaris000 ,@inishchith,@sduenas,@zhquan I am currently doing MicroTask 2
Microtask 2: Create a Python script to execute Perceval via its Python interface using the Git and GitHub backends. Feel free to select any target repository.
I have executed Perceval via its Python interface using the Git. But facing a issue when using Guthub backend. I have selected my own repo as target repo.
Issue: When I am calling fetch method (category='issue') for calculating total number of issues in a repo, it is not giving correct number of issues. I am thinking that it is giving me sum of issues and pull requests,instead of only issues, I am saying so because when I have added one more pull request in my repo and then again run the code, the number of issues increased by one, which should not happen. fetch method (categories ="pull_request" ) is working fine.
This is my code segment
# Calling fetch method for getting information from github repo and calculating total number of issue
REPOSITORY_NAME = "DSA_LAB"
github_backend = GitHub(owner="kshitij3199", api_token=[config.info["API_Token"]], repository=REPOSITORY_NAME)
from_date = datetime(2020, 1, 1)
to_date = datetime(2020,3,10)
range_issues = github_backend.fetch(category='issue', from_date=from_date, to_date=to_date)
range_issues_list = list(range_issues)
n_issues = len(range_issues_list)
print("NUMBER OF ISSUES: ", n_issues)
The total number of issue it is showing is 4 but there is only 1 issue and 3 pull request. When adding one more pull request, the number of total issues get increased by one. Showing Total issue as 5 but there is only 1 issue and 4 pull request.
https://github.com/chaoss/grimoirelab/issues/285#issuecomment-596825820
Hi @kshitij3199
I am thinking that it is giving me sum of issues and pull requests,instead of only issues
You are absolutely correct. In GitHub, every pull request is considered as an issue. You can read more about it from here, https://developer.github.com/v3/pulls/#labels-assignees-and-milestones.
I hope this helps you. :slightly_smiling_face:
Thank you @vchrombie for the clarification
Thank you very much @vchrombie for your answer. Now I can start with Microtask 3 :slightly_smiling_face:
Respected Mentors,
In the GitHub backend for Perceval, the parameter, API_Token accepts a list of token as mention in chaoss/grimoirelab-perceval#546 but I think this is not mention in perceval docs. Should I send a PR to add this information in perceval docs?
Hi @kshitij3199 , thank you for raising this question. The perceval docs is automatically updated, there is already an issue open about it (https://github.com/chaoss/grimoirelab-perceval/issues/625). If you want, you could send a PR to improve the doc at https://chaoss.github.io/grimoirelab-tutorial/perceval/github.html#retrieving-from-a-python-script, WDYT?
Hi @valeriocos, in the doc at https://chaoss.github.io/grimoirelab-tutorial/perceval/github.html#retrieving-from-a-python-script Details regarding API_Token is already mentioned.
Include the token in a list, api_token=[“XXXXXX”, “XXXXXX”, …..] as it is possiblity to pass a list of tokens to get over rate limits. To run this script, just run (of course, substituting “XXXXX” for your token):
If I found some other things to improve in the docs, I would definitely send a PR. Thankyou for your time
Sorry for not being precise, I was referring to improve the snippet of code there:
#! /usr/bin/env python3
import argparse
from perceval.backends.core.github import GitHub
# Parse command line arguments
parser = argparse.ArgumentParser(
description = "Simple parser for GitHub issues and pull requests"
)
parser.add_argument("-t", "--token",
'--nargs', nargs='+',
help = "GitHub token") <------- "GitHub tokens"
parser.add_argument("-r", "--repo",
help = "GitHub repository, as 'owner/repo'")
args = parser.parse_args()
# Owner and repository names
(owner, repo) = args.repo.split('/')
# create a Git object, pointing to repo_url, using repo_dir for cloning <----- # create a GitHub object, passing the owner and repository, plus a list of tokens. Note that not passing a list will throw an error
repo = GitHub(owner=owner, repository=repo, api_token=args.token)
# fetch all issues/pull requests as an iterator, and iterate it printing
# their number, and whether they are issues or pull requests
for item in repo.fetch():
if 'pull_request' in item['data']:
kind = 'Pull request'
else:
kind = 'Issue'
print(item['data']['number'], ':', kind)
Thank you @valeriocos for your clarification, I will improve the snippet of the code there and will send you a PR.
Hello @valeriocos,
I have set up dev environment to work on GrimoireLab and executed micro-mordred. After which I got following screen in kibana
But for some fields like jetkins, git, github_issues etc no data is available(I have tried changing time duration). So just want to ask whether I have done some mistake in setting up GrimoireLab or is it fine.
https://github.com/chaoss/grimoirelab/issues/285#issuecomment-597309079
Hi @kshitij3199
I want to ask which backend did you run. If you are running only git backend (--backend git
) then, in that case, you cannot retrieve other things. Also, make sure of the configurations needed for that. (I mean projects.json and setup.cfg). You can find them here.
I hope I have answered your question. :slightly_smiling_face:
Thankyou @vchrombie for such a quick response. I will look into my setup.cfg and project.json file for any discrepancy and reply back to you.
Hi @kshitij3199 I've been having problems setting up docker-compose for micro-mordred, you seem to get it, could you send your docker-config.yml
file here. I can't figure out what the problem with my system is.
Hi @kshitij3199,
Based on the datasource declared in the setup.cfg, the Mordred task panels automatically imports the corresponding dashboards and add them to top menu (ref: https://github.com/chaoss/grimoirelab-sirmordred/blob/master/sirmordred/task_panels.py#L239, https://github.com/chaoss/grimoirelab-sirmordred/blob/master/sirmordred/task_panels.py#L495). Thus, some dashboards may be empty if you execute the raw/enrich phases with micro-mordred (on some data sources) and the phase --panels
.
Hope this helps
Hi @imnitishng , you can find my docker-compose.yml file here I Hope it helps you :slightly_smiling_face:.
Thankyou @valeriocos @vchrombie for your help, I was using only git backend( -- backend git
), because of which I was unable to retrieve other data.
But Today when I am executing micro-mordred I am facing following issues
Something went wrong when adding an alias on http://localhost:9200/git_chaoss. Alias not set. 400 Client Error: Bad Request for url: https://admin:admin@localhost:9200/_aliases
2020-03-11 18:48:47,751 [git] Problem executing study enrich_areas_of_code:git, RequestError(400, 'search_phase_execution_exception', 'No mapping found for [metadata__timestamp] in order to sort on') 2020-03-11 18:48:47,751 RequestError(400, 'search_phase_execution_exception', 'No mapping found for [metadata__timestamp] in order to sort on') Process finished with exit code 255
can you copy here the output of these commands? thanks
curl -XGET https://admin:admin@localhost/_aliases?pretty -k
curl -XGET https://admin:admin@localhost/_cat/indices -k
@valeriocos , this is the output i am getting for both the command
curl: (7) Failed to connect to localhost port 443: Connection refused
can you try with -k ? (I have just edited the comment above)
For the command
curl -XGET https://admin:admin@localhost:9200/_aliases?pretty -k
output is
{ "github_issues_chaoss" : { "aliases" : { } }, "cocom_chaoss" : { "aliases" : { "cocom-raw" : { } } }, "git_chaoss" : { "aliases" : { } }, "git-aoc_chaoss_enriched" : { "aliases" : { } }, "git_chaoss_enriched" : { "aliases" : { "demographics" : { } } }, "searchguard" : { "aliases" : { } }, "github_issues_chaoss_enriched" : { "aliases" : { } }, ".kibana" : { "aliases" : { } } }
for command
curl -XGET https://admin:admin@localhost:9200/_cat/indices -k
output is yellow open github_issues_chaoss wE_5RzYlSxWreyR4KKtkCw 5 1 0 0 1.1kb 1.1kb yellow open cocom_chaoss cRQdra5eQFiEO6BxX7am7w 5 1 57 1 146.7kb 146.7kb yellow open git_chaoss 2RwIDj3xQHKNGYPJT7M5NQ 5 1 0 0 1.2kb 1.2kb yellow open git-aoc_chaoss_enriched 5K-P9oJSQSmr2ehfTo1yoA 5 1 0 0 1.2kb 1.2kb yellow open git_chaoss_enriched 9JwfibY7RYiOxwP3bYtNsA 5 1 0 0 1.2kb 1.2kb green open searchguard t2SwaGUmQtKiA9V2PiA-TA 1 0 0 5 33.6kb 33.6kb yellow open github_issues_chaoss_enriched V1jXyE9rT6-i2M9smmK2fw 5 1 0 0 1.1kb 1.1kb yellow open .kibana XgDSRZXnT_mExyBlwVLt4Q 1 1 319 0 347.2kb 347.2kb
Something went wrong when adding an alias on http://localhost:9200/git_chaoss. Alias not set. 400 Client Error: Bad Request for url: https://admin:admin@localhost:9200/_aliases
Did you modify this file https://github.com/chaoss/grimoirelab-sirmordred/blob/master/aliases.json by chance?
2020-03-11 18:48:47,751 [git] Problem executing study enrich_areas_of_code:git, RequestError(400, 'search_phase_execution_exception', 'No mapping found for [metadata__timestamp] in order to sort on') 2020-03-11 18:48:47,751 RequestError(400, 'search_phase_execution_exception', 'No mapping found for [metadata__timestamp] in order to sort on') Process finished with exit code 255
This error is due to the fact that the index git-aoc_chaoss_enriched
is empty (0 0 and the sizes). Please check the comment https://github.com/chaoss/grimoirelab/issues/285#issuecomment-590061923 and consider to submit a PR to sirmordred to improve the troubleshooting section (https://github.com/chaoss/grimoirelab-sirmordred#troubleshooting)
Thankyou very much @valeriocos, finally it worked :smile: and I was able to see graphs,data etc in kibana. Also I will be sending a PR to improve troubleshooting section.
But I am still getting this error, I have deleted and pasted new aliases.json file
Something went wrong when adding an alias on http://localhost:9200/git_chaoss. Alias not set. 400 Client Error: Bad Request for url: https://admin:admin@localhost:9200/_aliases
Also there is one more issue, I forgot to mention
Error enriching ocean from git (https://github.com/chaoss/grimoirelab-perceval): unhashable type: 'dict' Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/grimoire_elk/elk.py", line 597, in enrich_backend elastic_enrich = get_elastic(url_enrich, enrich_index, clean, enrich_backend, es_enrich_aliases) File "/usr/local/lib/python3.6/dist-packages/grimoire_elk/utils.py", line 260, in get_elastic analyzers=analyzers, aliases=es_aliases) File "/usr/local/lib/python3.6/dist-packages/grimoire_elk/elastic.py", line 110, in init self.add_alias(alias) File "/usr/local/lib/python3.6/dist-packages/grimoire_elk/elastic.py", line 239, in add_alias if aliases and alias in aliases: TypeError: unhashable type: 'dict'
Also there is one more issue, I forgot to mention
I'll try to replicate the problem on my machine
Can you share your setup.cfg, projects.json (just the part about git and general will be good) and the version of elasticsearch you are using? Thanks
Hi @valeriocos ,
I have used this version elasticsearch==6.3.1 elasticsearch-dsl==6.3.1 file-read-backwards==2.0.0
setup.cfg file
[general] short_name = Grimoire update = false min_update_delay = 10 debug = true logs_dir = logs bulk_size = 100 scroll_size = 100 menu_file = ../menu.yaml aliases_file = ../aliases.json
[git] raw_index = git_chaoss enriched_index = git_chaoss_enriched latest-items = false category = commit studies = [enrich_demography:git, enrich_areas_of_code:git, enrich_onion:git]
projects.json file
{
"grimoire": {
"git": [
"https://github.com/chaoss/grimoirelab-perceval"
],
"cocom": [
"https://github.com/chaoss/grimoirelab-toolkit"
],
"colic": [
"https://github.com/chaoss/grimoirelab-toolkit"
],
"*github": [
"https://github.com/chaoss/grimoirelab-perceval"
],
"*github:pull": [
"https://github.com/chaoss/grimoirelab-perceval"
],
"github:repo": [
"https://github.com/chaoss/grimoirelab-perceval"
],
"jenkins": [
"https://build.opnfv.org/ci"
],
"gitlab:issue": [
"https://gitlab.com/gitlab-org/gitlab-ce"
],
"gitlab:merge": [
"https://gitlab.com/gitlab-org/gitlab-ce"
]
}
}
ok, thanks! the docker compose is this one: https://github.com/chaoss/grimoirelab-sirmordred#source-code-and-docker?
@valeriocos I have changed elasticsearch User and password for kibiter
ELASTICSEARCH_USER=admin
ELASTICSEARCH_PASSWORD=admin
please find complete file here docker-compose.yml
Hi @valeriocos, I have send a PR based on what you have said. Please have a look at https://github.com/chaoss/grimoirelab-sirmordred/pull/418 and https://github.com/chaoss/grimoirelab-sirmordred/pull/419
Please check the comment #285 (comment) and consider to submit a PR to sirmordred to improve the troubleshooting section (https://github.com/chaoss/grimoirelab-sirmordred#troubleshooting)
Thanks @kshitij3199 for the PRs
https://github.com/chaoss/grimoirelab/issues/285#issuecomment-597698114
I'm not able to replicate your issue, I believe you are using an old version of ELK and not the one in master. From the log you posted at https://github.com/chaoss/grimoirelab/issues/285#issuecomment-597661939, I see that there are some inconsistencies between the calls in your code and the ones in the master branch. For instance:
File "/usr/local/lib/python3.6/dist-packages/grimoire_elk/elastic.py", line 110, in init self.add_alias(alias). self.add_alias(alias) is at line 89: https://github.com/chaoss/grimoirelab-elk/blob/master/grimoire_elk/elastic.py#L89
File "/usr/local/lib/python3.6/dist-packages/grimoire_elk/elastic.py", line 239, in add_alias if aliases and alias in aliases: TypeError: unhashable type: 'dict'. The equivalent line is at line 269: https://github.com/chaoss/grimoirelab-elk/blob/master/grimoire_elk/elastic.py#L269
@valeriocos , I am using grimoire-elk version 0.63.0
it's a bit old :) the last one is 0.70.0: https://github.com/chaoss/grimoirelab-elk/commit/6f449a69d8730d88712bb6332227808a6bdadd4a. Can you try if with this version the error is gone? thanks
@valeriocos , I have tried downloading the latest version with pycharm package installer and with terminal but they are not showing any package above, 0.63.0. Any other way to download it.
pip3 install grimoire-elk==0.70.0
Could not find a version that satisfies the requirement grimoire-elk==0.70.0 (from versions: 0.20rc1, 0.22, 0.22.1, 0.26.5, 0.30.4, 0.30.7, 0.30.8, 0.30.9, 0.30.11, 0.30.13, 0.30.18, 0.30.22, 0.30.23, 0.30.24, 0.30.27, 0.30.30, 0.30.33, 0.30.37, 0.30.39, 0.30.48, 0.30.51, 0.30.53, 0.31.0, 0.31.4, 0.32.0, 0.36.0, 0.47.0, 0.55.0, 0.58.0, 0.62.0, 0.63.0) No matching distribution found for grimoire-elk==0.70.0
The lastest version isn't available on pip. Please follow the instructions at https://github.com/chaoss/grimoirelab-sirmordred#setting-up-a-pycharm-dev-environment, so you can use the code in the master branch, thanks
Hi @kshitij3199
@valeriocos , I have tried downloading the latest version with pycharm package installer and with terminal but they are not showing any package above, 0.63.0. Any other way to download it.
You can use the Project Structure
to add the repository.
Thankyou @valeriocos @vchrombie for your help, Now env is correctly set up as the code is running without any error or warning :smile:
Hi, I am Soniya Nayak and I am an Outreachy applicant 2020. This project looks very interesting and I'm looking forward to contributing here!
Hi @Soniyanayak51, welcome on board! Please have a look at the microtasks and don't hesitate to write if you need help.
Hi @valeriocos , I am currently working on test_git.py file to increase the coverage of git.py file. In https://github.com/chaoss/grimoirelab-elk/blob/master/tests/test_git.py file, some tests like test test_refresh_identities and test_refresh_project are not written completely. Any specific reason for that ?
Thank you @kshitij3199 for working on this, there is no specific reason. Please note that you can complete the tests by querying the enriched index and check that the data is correctly stored (you can mimic the code at https://github.com/chaoss/grimoirelab-elk/pull/801/files#diff-c12d8b17feda020355ff7084da770c2bR105)
Hi @valeriocos, In the test_git.py file, we don't have tests that checks areas_of_code and git_branches methods. So can we add tests for this methods to increase the coverage of git.py file?
Hi @kshitij3199 , good idea! thanks! Maybe you can start with areas of code, the test should be similar to https://github.com/chaoss/grimoirelab-elk/blob/master/tests/test_git.py#L236
Hi @valeriocos , I am writing test for areas_of_code, and I am facing a issue.
Traceback (most recent call last): File "test_git.py", line 291, in test_enrich_areas_of_code study(ocean_backend, enrich_backend) File "../grimoire_elk/enriched/git.py", line 543, in enrich_areas_of_code for source in self.json_projects.values(): AttributeError: 'NoneType' object has no attribute 'values'
the issue is that value of self.json_projects is None
This is the part of code
def test_enrich_areas_of_code(self):
""" Test that areas of code works correctly"""
study, ocean_backend, enrich_backend = self._test_study('enrich_areas_of_code')
with self.assertLogs(logger, level='INFO') as cm:
if study.__name__ == "enrich_areas_of_code":
study(ocean_backend, enrich_backend)
Hi, I am Haiming Lin, a student at Tongji University. I am very interested in working on this idea.
I have a question when going through the code of sirmordred.py. As is shown below, it calls the execute_batch_tasks
function with the same params twice. Is there any specific reason for that ?
if not self.conf['general']['update']:
sleep_for = self.conf['sortinghat']['sleep_for'] if self.conf.get('sortinghat', None) else 1
self.execute_batch_tasks(all_tasks_cls,
sleep_for,
self.conf['general']['min_update_delay'])
self.execute_batch_tasks(all_tasks_cls,
sleep_for,
self.conf['general']['min_update_delay'])
break
Thanks!
https://github.com/chaoss/grimoirelab/issues/285#issuecomment-600974355
Thank you @heming6666 for your interest. That part of the code isn't really used, since the attribute update
in setup.cfg is generally set to True
. Please open an issue on sirmordred, and we can move the discussion there, thanks!
https://github.com/chaoss/grimoirelab/issues/285#issuecomment-600789030
@kshitij3199 can you share your code by opening a pull request on ELK (I'll try to reproduce the error)? Thanks
Hi @valeriocos , I have send a pull request chaoss/grimoirelab-elk#811. please check
GrimoireLab allows to produce analytics with data extracted from more than 30 tools used for contributing to Open Source development such as version control systems, issue trackers and forums. A common execution of GrimoireLab consists in collecting data from a given repository, processing and enriching the data obtained and finally visualizing it on dynamic Web dashboards. At the core of this process there is a component called ELK, which is in charge of integrating the data finally shown on the dashboards.
The evolution of GrimoireLab requires now to reshape some of the functionalities provided by ELK to improve its maintainability. This project idea is about refactoring and redesigning the core of ELK using popular libraries for data management and processing such as elasticsearch-py and pandas.
The aims of the project are as follows:
The aims will require working with Python, ELK and the ElasticSearch database.
Microtasks
For becoming familiar with GrimoireLab, you can start by reading some documentation. You can find useful information at:
Once you're familiar with Grimoirelab, you can have a look at the following microtasks.
Microtask 0: Download PyCharm and get familiar with it (for instance, you can follow this tutorial).
Microtask 1: Set up Perceval to be executed from PyCharm.
Microtask 2: Create a Python script to execute Perceval via its Python interface using the Git and GitHub backends. Feel free to select any target repository.
Microtask 3: Based on the JSON documents produced by Perceval and its source code, try to answer the following questions:
timestamp
?updated_on
?origin
?category
?uuid
?search_fields
?data
of each JSON document produced by Perceval?Microtask 4: Set up a dev environment to work on GrimoireLab. Have a look to https://github.com/chaoss/grimoirelab-sirmordred#setting-up-a-pycharm-dev-environment.
Microtask 5: Execute micro-mordred to collect and enrich data from any Git repository.
Microtask 6: Execute micro-mordred to obtain data from the study
enrich_areas_of_code
for any Git repository.Microtask 7: Execute micro-mordred to collect and enrich data from any GitHub repository, making sure that no archives are created by Perceval.
Microtask 8: In your machine, run the tests for ELK within PyCharm. If you succeed, you can try to run the coverage package on the ELK tests and report the details of each file.
Microtask 9: Submit at least a PR to one of the GrimoireLab repositories to fix an issue, improve the documentation, etc.
Microtask 10: Submit a PR to ELK to increase the test coverage of one or more files located at https://github.com/chaoss/grimoirelab-elk/tree/master/grimoire_elk/enriched