Closed valeriocos closed 4 years ago
Thank you @valeriocos for you reply. I have moved the discussion to here.
Hi @valeriocos, can you tell which library is required to be implemented as stated here?
Replacing low-level libraries (e.g., requests) with popular ones used to interact with ElasticSearch.
Is it the elasticsearch-py library or something like httpx library
The idea is to use elasticsearch-py, however if you find other good candidates don't hesitate to add them to your proposal and/or share them here. Thanks!
But I've seen elasticseach-py and elasticsearch-dsl already being used in the project.
Replacing low-level libraries (e.g., requests) with popular ones used to interact with ElasticSearch.
And since it mentions requests so maybe we need a better faster asynchronous HTTP library like aiohttp? Is that not the objective?
But I've seen elasticseach-py and elasticsearch-dsl already being used in the project.
That's true, but there are still some parts of the code that rely on requests
. One of the goal of this idea is to reduce the logic that interacts with ElasticSearch. Good candidates are elasticseach-py and elasticsearch-dsl because they provide already some high-level methods (that remove boilerplate code)
And since it mentions requests so maybe we need a better faster asynchronous HTTP library like aiohttp? Is that not the objective?
Any library (Aiohttp or other ones) can be good a candidate if it performs better than requests (in the specific case of GrimoireELK) and/or allows to write clean code to interact with ElasticSearch.
Hi and also wanted to know about the ODFE implementation, everywhere throughout grimoirelab I've seen Elasticsearch version 6.1 being used
Enabling the correct working of ELK for different version of ElasticSearch (>=6.1) and Open Distro for ElasticSearch (>= 0.9.0).
Project demands working of ODFE, I wanted to know what the current progress with Elasticseach and ODFE is. This PR adds support for using ODFE 1.2.0 without Kibiter panels. So do we need to fix this panels and other issues for correct working of Elasticseach 7.2.0 and ODFE 1.2.0 or do we start over with a new approach to support ODFE 0.10.0 with Elasticseach 6.8.1.
One of the goal of this idea is to reduce the logic that interacts with ElasticSearch. Good candidates are elasticseach-py and elasticsearch-dsl because they provide already some high-level methods (that remove boilerplate code)
Okay, I get it. Thank You.
As far as I have read aiohttp should be performing better but I'm afraid the code won't be as clean as it is, because the library performs faster but at the cost of more code lines.
Hi and also wanted to know about the ODFE implementation, everywhere throughout grimoirelab I've seen Elasticsearch version 6.1 being used
Hi @imnitishng ! ODFE is supported by ELK (https://github.com/chaoss/grimoirelab-elk/blob/master/.travis.yml#L123), but not by panels.
Project demands working of ODFE, wanted to know what the current progress with Elasticseach and ODFE is, this PR adds Support for using ODFE 1.2.0 without Kibiter panels, so what exactly does this demand. I need to fix this panels and other issues for correct working of Elasticseach 7.2.0 and ODFE 1.2.0 or do we start over with a new approach to support ODFE 0.10.0 with Elasticseach 6.8.1.
This is something that should be discussed/evaluated at the beginning of the intership. ATM, there are 2 possible approaches to complete the integration with ODFE.
The first one is to move all the panels management to a different component. This means that ELK and mordred should be refactored to remove all the code dealing with aliases and panels upload. Under this context, ELK would become a fast processing library on top of ElasticSearch DBs. The second approach (which is you pointed out) is to modify Kidash to make sure that the panels can be uploaded to ODFE (this implies to fix also other issues that may pop up when migrating the panels).
Okay, I get it. Thank You.
You're welcome!
As far as I have read aiohttp should be performing better but I'm afraid the code won't be as clean as it is, because the library performs faster but at the cost of more code lines.
I see, how much faster does it perform wrt requests?
This is something that should be discussed/evaluated at the beginning of the intership.
Okay thank you very much.
aiohttp allows sending requests in series but without waiting for the first reply to come back before sending the new one unlike requests along with many other decoding optimizations.
The below results were obtained sending request to httpbin.org
.
requests with session called: 11.22s
aiohttp called: 1.19s
Hi @valeriocos, For GSoC proposal, can you please tell what all things we have to mention apart form MicroTasks.
Do we have to discuss the libraries that we will use to interact with ElasticSearch and other things like how we will Improve the processing of Perceval data.
Hi @kshitij3199 !
Do we have to discuss the libraries that we will use to interact with ElasticSearch and other things like how we will Improve the processing of Perceval data.
Yes, the proposal should include the libraries/technologies you would like to use and a plan (with a timeline of actions/tasks) to achieve the goals of the project. For instance:
Let me know if this answers your question, thanks!
Thankyou @valeriocos, I will soon upload my GSoC proposal (so that we discuss and update it if required)
You're welcome @kshitij3199
Hi @valeriocos. I'd like to know a bit more about these objectives. Can you explain these in more detail? I don't seem to get these now.
Hi @imnitishng ! Yes, sure
Reorganizing part of the ELK logic into coherent packages.
ELK does many things in the same module. Let's take as an example the gitlab enricher (https://github.com/chaoss/grimoirelab-elk/blob/master/grimoire_elk/enriched/gitlab.py). As you can see, we have methods to:
get_identities
, get_item_sh
)__add_milestone_info
)get_grimoire_fields
)enrich_onion
)A possible idea is to evaluate how the logic above can be reorganized in different modules to ease the understanding and the evolution of ELK.
[*] a study is new information derived from existing indexes and (i) added to an existing index or (ii) stored in a new index.
Improving the processing of Perceval data.
ELK creates enriched data by processing each Perceval document via the method get_rich_item
(present in each enricher). At the same time, in some cases ELK relies on cereslib to create study data. Cereslib uses pandas to manipulate the data which is a popular data processing library.
A possible idea is to evaluate (i) if/how the approach implemented in cereslib can be extended to the creation of the enriched data and (ii) the use of pandas (or other similar libraries) to create enriched data.
Let me know if this solves your doubts, thanks!
Thank you so much @valeriocos, that helped.
Hi @valeriocos For the following aim, what I think we can do is
Replacing low-level libraries (e.g., requests) with popular ones used to interact with ElasticSearch.
Replace Library Request with elasticsearch-dsl in grimoirelab-elk ( elasticsearch-dsl and elasticsearch-py are both Python API client for elasticsearch. But what I think is, in elasticsearch-dsl it is more convenient to write queries than elasticsearch-py)
Reorganizing part of the ELK logic into coherent packages.
Identity related method and Study related method present in Enrich class should be moved to different modules. Because they need some methods that is not needed for the Enrich class and also they are increasing line of code for the class . So it will be better if we have different modules for them
Some Question
Enabling the correct working of ELK for different version of ElasticSearch (>=6.1) and Open Distro for ElasticSearch (>= 0.9.0).
Accordings to open distro doc, it provide features like elasticsearch, kibana, security, alerting, sql etc. so are we using open distro as a plugin?
Accordings to open distro doc, it provide features like elasticsearch, kibana, security, alerting, sql etc. so are we using open distro as a plugin?
OpenDistro leverages on ElasticSearch and add some additional features.
The initial goal is to make sure that ELK and possibly GrimoireLab can work with OpenDistro (in particular with its elasticsearch and kibana). Alerting and other features available in OpenDistro (but not in ElasticSearch) can be evaluated during the intership.
Hi @valeriocos, I am getting bit confused with open Distro part
Basically the working of is GrimoireLab 1) Obtain data from a data source (like git or github) using Perceval 2) GrimoireELK stores this data as raw indexes and then processes it and make enriched indexes( with the hep of sorting hat etc) 3) this enriched indexes are passed to kibiter for visualisation
So Do we want that Open Distro should work with GrimoireELK in order to produce enriched indexes ?
Basically the working of is GrimoireLab ...
Yes
So Do we want that Open Distro should work with GrimoireELK in order to produce enriched indexes ?
Yes
@kshitij3199 we can talk on IRC tomorrow about the doubts you have about ODFE, WDYT?
Yes , sure. @valeriocos Can you please tell the time
one more thing @valeriocos
The initial goal is to make sure that ELK and possibly GrimoireLab can work with OpenDistro (in particular with its elasticsearch and kibana)
Why are we saying that OpenDistro should work with kibana ?? I think open Distro should work with GrimoireELK and the task is to produce enrich indexes which then later can be feeded in kibana
I mean that there is no connection between open Distro and kibana ?
https://github.com/chaoss/grimoirelab/issues/285#issuecomment-603336536
I'm available tomorrow from 10h30 until 17h30 (Madrid, Spain). Please pick the timeframe that best suits you.
Why are we saying that OpenDistro should work with kibana ?? I think open Distro should work with GrimoireELK and the task is to produce enrich indexes which then later can be feeded in kibana
I agree with you that GrimoireELK should produce the enriched indexes. In the current implementation, these indexes are consumed by some dashboards, which are automatically uploaded by GrimoireLab to Kibiter (a downstream of Kibana). It is important to make sure that these dashboards are uploaded even when using the Kibana version of ODFE.
Some info is available at https://github.com/chaoss/grimoirelab/issues/285#issuecomment-602255449
Let me know if this answers your doubts, thanks
Thankyou @valeriocos ,it answers my doubts.
you're welcome @kshitij3199 ! Tomorrow I'll be on IRC, in case you want to discuss something.
thank you @valeriocos, if something comes that need to be discuss, i will message you on IRC
Hi @valeriocos , (I tried searching you on IRC but couldn't find, maybe wrong timing)
Improving the processing of Perceval data.
In this aim are we expected to rewrite/modify studies using cereslib just like area of code and onion study or something different is expected ?
Sorry @kshitij3199 , I didn't see this notification!
In this aim are we expected to rewrite/modify studies using cereslib just like area of code and onion study or something different is expected ?
The first option is the preferred one.
Hi @valeriocos ,
for example, in studies like enrich_demography ,enrich_forest_activity, enrich_feelings etc we have to
1) Make sure that all Request
library are replaced by elasticsearch-py or dsl
Improving the processing of Perceval data. 2) (This is one that is confusing me ) So in this part, for studies mentioned above, we have to get data from elasticsearch and store it in dataframe, using the same logic as before, enriched the data using pandas (use function from cereslib where possible ), store the enriched data back to elasticsearch. ??? Is this approach we should follow?? If I am wrong can you please describe how should be the approach/procedure? Thankyou
HI @kshitij3199 , thank you for your time in understanding and exploring the problems of this idea!
1.Make sure that all Request library are replaced by elasticsearch-py or dsl
Yes, if you notice, the code to implement the studies consist of a common part to read and process the items. This part could be generalized (moved maybe to cereslib or a new ELK module) and rewritten with elasticsearch-py or dsl.
- (This is one that is confusing me )
Yes, the approach you described is the one that should be followed.
Thanks!
Thankyou @valeriocos ,
I was thinking that
Identity related method and Study related method present in Enrich class should be moved to different modules
so when we will create a new module for study related method we can take care of following
This part could be generalized (moved maybe to cereslib or a new ELK module) and rewritten with elasticsearch-py or dsl.
So does it means that ,all the 3 tasks have to be done at same time ( when we are creating new module for study related method)??
1) Replacing low-level libraries (e.g., requests) with popular ones used to interact with ElasticSearch. 2)Reorganizing part of the ELK logic into coherent packages. 3)Improving the processing of Perceval data.
So does it means that ,all the 3 tasks have to be done at same time ( when we are creating new module for study related method)??
They can be splitted in sub-tasks and the integration to ELK can be done incrementally. Does it answer your question, @kshitij3199 ?
Thankyou @valeriocos , it answer my question.
I agree with you that GrimoireELK should produce the enriched indexes. In the current implementation, these indexes are consumed by some dashboards, which are automatically uploaded by GrimoireLab to Kibiter (a downstream of Kibana). It is important to make sure that these dashboards are uploaded even when using the Kibana version of ODFE.
Can you please tell what are dashboards here these indexes are consumed by some dashboards
?are they visualiztion like different graphs,charts ? and how do GrimoireLab upload them to kibiter?
So when are using ODFE, is it necessary to use its Kibana version. Can't we use kibiter ?
If we use kibiter, can you tell what things we will have to fix?
You're welcome!
Can you please tell what are dashboards here these indexes are consumed by some dashboards?
The dashboards are available at https://github.com/chaoss/grimoirelab-sigils
are they visualiztion like different graphs,charts ?
Yes, for instance pie charts, bar charts, tables, and so on.
and how do GrimoireLab upload them to kibiter?
This is done by the task_panels (https://github.com/chaoss/grimoirelab-sirmordred/blob/master/sirmordred/task_panels.py). Under the hood, it call kidash, which is in charge of taking the dashboards and index patterns (see sigils repo) and save them in the index .kibana
.
So when are using ODFE, is it necessary to use its Kibana version. Can't we use kibiter ?
ATM we don't have a kibiter version for ODFE. However consider that kibiter is a downstream of kibana and it adds to kibana some additional plugins. Thus, there is basically no difference between kibiter and kibana in terms of common functionalities (how to create a dashboard, a visualization, and so on).
If we use kibiter, can you tell what things we will have to fix?
The same thing will have to fix if we use Kibana >= 7.x (not sure if this applies also to Kibana >= 6.8), which is the way kidash uploads the dashboard/index patterns to the .kibana
. However, please note that this fix should be evaluated as commented at https://github.com/chaoss/grimoirelab/issues/285#issuecomment-602255449
Thankyou @valeriocos for detailed answer
you're welcome @kshitij3199 !
Hi @valeriocos, can we connect on IRC for some time? Please tell me when you're free.
Hi @imnitishng , sure! At 15h00 Madrid time (around 1h30 from now) is it OK for you?
Yea sure.
Hi @valeriocos , This is my first draft of GSoC proposal.
Please see it and let me know what things need to be corrected. Thankyou
@valeriocos I have shared the draft proposal with the organization, please have a look.
Hi @imnitishng , can you share the link to the draft?
Hi @valeriocos , can you please tell what things need to be change in my first draft of GSoC proposal
Hi @valeriocos , This is my first draft of GSoC proposal.
Please see it and let me know what things need to be corrected. Thankyou
Oh okay sure @valeriocos https://docs.google.com/document/d/1_9WaTWfe_qKmKcdbusWpbkJ4Wk7xIxmXNReedKqSvZg/edit?usp=drivesdk
Hi @valeriocos , can you please tell when you will be on IRC, I want to discuss few things?
Hi @kshitij3199 , in 20 minutes can be OK (12h30 Madrid time), can be OK?
yes, its fine
@kshitij3199 @imnitishng
I tried opening your draft proposal links in order to review them, but I guess it's restricted. Please do let me know once you have granted public access.
/cc @valeriocos
I'm really sorry, might have been a mistake. I've made it public again. Please have a look! Thank You @inishchith !
@imnitishng No worries.
I'll have a look at it
Should I submit a proposal via the GSoC portal?
Yes.
Proposals must be submitted in two places for CHAOSS: The GSoC portal and our interest page.
GrimoireLab allows to produce analytics with data extracted from more than 30 tools used for contributing to Open Source development such as version control systems, issue trackers and forums. A common execution of GrimoireLab consists in collecting data from a given repository, processing and enriching the data obtained and finally visualizing it on dynamic Web dashboards. At the core of this process there is a component called ELK, which is in charge of integrating the data finally shown on the dashboards.
The evolution of GrimoireLab requires now to reshape some of the functionalities provided by ELK to improve its maintainability. This project idea is about refactoring and redesigning the core of ELK using popular libraries for data management and processing such as elasticsearch-py and pandas.
The aims of the project are as follows:
The aims will require working with Python, ELK and the ElasticSearch database.
Microtasks
For becoming familiar with GrimoireLab, you can start by reading some documentation. You can find useful information at:
Once you're familiar with Grimoirelab, you can have a look at the following microtasks.
Microtask 0: Download PyCharm and get familiar with it (for instance, you can follow this tutorial).
Microtask 1: Set up Perceval to be executed from PyCharm.
Microtask 2: Create a Python script to execute Perceval via its Python interface using the Git and GitHub backends. Feel free to select any target repository.
Microtask 3: Based on the JSON documents produced by Perceval and its source code, try to answer the following questions:
timestamp
?updated_on
?origin
?category
?uuid
?search_fields
?data
of each JSON document produced by Perceval?Microtask 4: Set up a dev environment to work on GrimoireLab. Have a look to https://github.com/chaoss/grimoirelab-sirmordred#setting-up-a-pycharm-dev-environment.
Microtask 5: Execute micro-mordred to collect and enrich data from any Git repository.
Microtask 6: Execute micro-mordred to obtain data from the study
enrich_areas_of_code
for any Git repository.Microtask 7: Execute micro-mordred to collect and enrich data from any GitHub repository, making sure that no archives are created by Perceval.
Microtask 8: In your machine, run the tests for ELK within PyCharm. If you succeed, you can try to run the coverage package on the ELK tests and report the details of each file.
Microtask 9: Submit at least a PR to one of the GrimoireLab repositories to fix an issue, improve the documentation, etc.
Microtask 10: Submit a PR to ELK to increase the test coverage of one or more files located at https://github.com/chaoss/grimoirelab-elk/tree/master/grimoire_elk/enriched