chaoss / grimoirelab-elk

GNU General Public License v3.0
58 stars 121 forks source link

Update schemas to the latest format #1010

Open vchrombie opened 2 years ago

vchrombie commented 2 years ago

ELK keeps a description for each enriched data used to build the KIbiter dashboards. Such descriptions are stored in the folder schema as CSV files. Over time, these descriptions have evolved and the current format is defined as a list of attributes that include the name, the type, whether the field can be aggregated and a description (e.g., schema/git.csv). Nevertheless, some schemas are still not aligned with the latest format. For instance, this is the case for:

The goal of this issue is to update the schemas to the latest format. In order to do so, given a data source (e.g., meetup, stackoverflow), micro-mordred[*] should be executed to collect and enrich the data. Then, the enriched documents should be inspected using the dev tools or the discover of Kibiter. For each attribute found in the enriched index, the corresponding schema should contain the name of the attribute, the type, whether the field can be aggregated and a description.

You can also use this script for automating the process and creating the schema file from the index: generate-es-index-schema.py

Note that some fields like the grimoire_creation_date, project, project_1, origin, etc. are shared across all enriched indexes and their descriptions can be taken from existing schemas.

[*] Details to execute micro-mordred for a given data source are available at: supported-data-sources.

Related issues

prokan468 commented 2 years ago

I have worked with CSV files and python. Please do assign this issue to me and I shall provide you with the results.

vchrombie commented 2 years ago

I have worked with CSV files and python. Please do assign this issue to me and I shall provide you with the results.

Hi @prokan468, thanks for showing interest. We cannot assign this issue since it is a long one. Feel free to choose the backend, follow the steps, update the schema and open the PR.

Please let me know if you need any help.