chaoss / grimoirelab-sirmordred

Orchestrate the execution of GrimoireLab tools to produce a dashboard
GNU General Public License v3.0
38 stars 121 forks source link

'collection for git: starting' repeatedly appear #462

Closed Julianbaozi closed 4 years ago

Julianbaozi commented 4 years ago

Screenshot from 2020-04-25 00-06-12

It can never be finished. the log: all.log

valeriocos commented 4 years ago

Hi @Julianbaozi !

By default GrimoireLab is executed in a never-ending loop. The param to disable the loop is update in section general: https://github.com/chaoss/grimoirelab-sirmordred#general, please set it to false.

If you are interested in executing grimoirelab only on a data source, you may want to have a look at micromodred: https://github.com/chaoss/grimoirelab-sirmordred#micro-mordred-. It is a simplified version of mordred (which doesn't include its scheduler).

Julianbaozi commented 4 years ago

Got you. what if I want to run on many repositories? How can I run them one by one automatically?

On Sat, Apr 25, 2020 at 1:48 AM valerio notifications@github.com wrote:

Hi @Julianbaozi https://urldefense.com/v3/__https://github.com/Julianbaozi__;!!Mih3wA!SXX8Bwn5V7pNITjRYKubzH0gKYWiDaSwzGIZ_WZo1Dms8Upi5Vn2ti5NhWOuLw$ !

By default GrimoireLab is executed in a never-ending loop. The param to disable the loop is update in section general: https://github.com/chaoss/grimoirelab-sirmordred#general https://urldefense.com/v3/__https://github.com/chaoss/grimoirelab-sirmordred*general__;Iw!!Mih3wA!SXX8Bwn5V7pNITjRYKubzH0gKYWiDaSwzGIZ_WZo1Dms8Upi5Vn2ti7O4EX-Ow$, please set it to false.

If you are interested in executing grimoirelab only on a data source, you may want to have a look at micromodred: https://github.com/chaoss/grimoirelab-sirmordred#micro-mordred- https://urldefense.com/v3/__https://github.com/chaoss/grimoirelab-sirmordred*micro-mordred-__;Iw!!Mih3wA!SXX8Bwn5V7pNITjRYKubzH0gKYWiDaSwzGIZ_WZo1Dms8Upi5Vn2ti5CIEjAKA$. It is a simplified version of mordred (which doesn't include its scheduler).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/chaoss/grimoirelab-sirmordred/issues/462*issuecomment-619345025__;Iw!!Mih3wA!SXX8Bwn5V7pNITjRYKubzH0gKYWiDaSwzGIZ_WZo1Dms8Upi5Vn2ti6_1dp8Dg$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKUX665JLHXWWWKPUIDPX6DROKPVXANCNFSM4MQUC5SQ__;!!Mih3wA!SXX8Bwn5V7pNITjRYKubzH0gKYWiDaSwzGIZ_WZo1Dms8Upi5Vn2ti5_jRVpYA$ .

valeriocos commented 4 years ago

You can add the list of repos to the projects.json, within the section git.

Example of the setup.cfg and projects.json are provided at:

Please note that studies can be set to [] or removed.

Latest-items is set to true by default. It allows to fetch only the new commits that appeared between two executions of the platform. However, if the repo has been already downloaded once and we are using a fresh new index, latest-items should be set to false to collect all commits. More details about this possible problem is detailed at https://github.com/chaoss/grimoirelab-sirmordred/blob/master/Getting-Started.md#empty-index-

Hope this helps

Julianbaozi commented 4 years ago

It allows to fetch only the new commits that appeared between two executions of the platform.

What do you mean by two executions?

On Sat, Apr 25, 2020 at 3:04 AM valerio notifications@github.com wrote:

You can add the list of repos to the projects.json, within the section git .

Example of the setup.cfg and projects.json are provided at:

Please note that studies can be set to [] or removed.

Latest-items is set to true by default. It allows to fetch only the new commits that appeared between two executions of the platform. However, if the repo has been already downloaded once and we are using a fresh new index, latest-items should be set to false to collect all commits. More details about this possible problem is detailed at https://github.com/chaoss/grimoirelab-sirmordred/blob/master/Getting-Started.md#empty-index- https://urldefense.com/v3/__https://github.com/chaoss/grimoirelab-sirmordred/blob/master/Getting-Started.md*empty-index-__;Iw!!Mih3wA!UQNwOI49y9s9bgaht7VymPFW4gnTWbl1cMAETFsLPiQ_S-MhItuhdlpylTrRsA$

Hope this helps

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/chaoss/grimoirelab-sirmordred/issues/462*issuecomment-619354706__;Iw!!Mih3wA!UQNwOI49y9s9bgaht7VymPFW4gnTWbl1cMAETFsLPiQ_S-MhItuhdlpGgdRSDQ$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKUX667YA5FM344S67T72KTROKYSJANCNFSM4MQUC5SQ__;!!Mih3wA!UQNwOI49y9s9bgaht7VymPFW4gnTWbl1cMAETFsLPiQ_S-MhItuhdlpoA0sAZA$ .

Julianbaozi commented 4 years ago

if I fetch the data once and I fetch it again, then are the items going to add up or just cover the old data?

On Sat, Apr 25, 2020 at 3:10 AM Junliang Yu yujl@ucsd.edu wrote:

It allows to fetch only the new commits that appeared between two executions of the platform.

What do you mean by two executions?

On Sat, Apr 25, 2020 at 3:04 AM valerio notifications@github.com wrote:

You can add the list of repos to the projects.json, within the section git.

Example of the setup.cfg and projects.json are provided at:

Please note that studies can be set to [] or removed.

Latest-items is set to true by default. It allows to fetch only the new commits that appeared between two executions of the platform. However, if the repo has been already downloaded once and we are using a fresh new index, latest-items should be set to false to collect all commits. More details about this possible problem is detailed at https://github.com/chaoss/grimoirelab-sirmordred/blob/master/Getting-Started.md#empty-index- https://urldefense.com/v3/__https://github.com/chaoss/grimoirelab-sirmordred/blob/master/Getting-Started.md*empty-index-__;Iw!!Mih3wA!UQNwOI49y9s9bgaht7VymPFW4gnTWbl1cMAETFsLPiQ_S-MhItuhdlpylTrRsA$

Hope this helps

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/chaoss/grimoirelab-sirmordred/issues/462*issuecomment-619354706__;Iw!!Mih3wA!UQNwOI49y9s9bgaht7VymPFW4gnTWbl1cMAETFsLPiQ_S-MhItuhdlpGgdRSDQ$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKUX667YA5FM344S67T72KTROKYSJANCNFSM4MQUC5SQ__;!!Mih3wA!UQNwOI49y9s9bgaht7VymPFW4gnTWbl1cMAETFsLPiQ_S-MhItuhdlpoA0sAZA$ .

valeriocos commented 4 years ago

Sorry @Julianbaozi, I should have been more precise.

What do you mean by two executions?

An execution of GrimoireLab is composed by two main steps: collection and enrichment.

In we focus on Git, during the collection phase GrimoireLab clones the Git repos locally (if they don't already exist). Then it accesses the Git log, extracts the commits data and stores this data to the ElasticSearch database.

The enrichment phase takes the commits data, adds additional information (e.g., pair programming), and stores the new data again in the ElasticSearch database. This new data is what is shown on the dashboard.

To make more efficient the collection of Git data, the param latest-items allows to fetch only the new commits that appeared between the previous execution and the current one. This is done by syncronizing the local clone with the upstream one.

if I fetch the data once and I fetch it again, then are the items going to add up or just cover the old data?

The items are going to add up.

Don't hesitate to write if something isn't clear, thanks!

Julianbaozi commented 4 years ago

Thank you! That was very helpful

Julianbaozi commented 4 years ago

How to add --from-date in project.json?

On Sat, Apr 25, 2020 at 3:57 AM valerio notifications@github.com wrote:

Sorry @Julianbaozi https://urldefense.com/v3/__https://github.com/Julianbaozi__;!!Mih3wA!SY1TORBKPOCANk3wdR0it2mO3I1YpVxZYMx8y8KwF-9iBwMYnLtoz0Zqfo7DfA$, I should have been more precise.

What do you mean by two executions?

An execution of GrimoireLab is composed by two main steps: collection and enrichment.

In we focus on Git, during the collection phase GrimoireLab clones the Git repos locally (if they don't already exist). Then it accesses the Git log, extracts the commits data and stores this data to the ElasticSearch database.

The enrichment phase takes the commits data, adds additional information (e.g., pair programming), and stores the new data again in the ElasticSearch database. This new data is what is shown on the dashboard.

To make more efficient the collection of Git data, the param latest-items allows to fetch only the new commits that appeared between the previous execution and the current one. This is done by syncronizing the local clone with the upstream one.

-

execution with latest-items set to true The first execution at time t1 collects and enriches all commits, the second execution at time t2 collects and enriches only the new commits after t1, the third execution at time t3 collects and enriches only the new commits after t2, etc.

execution with latest items set to false The first execution at time t1 collects and enriches all commits, the second execution at time t2 collects and enriches all commits, the third execution at time t3 collects and enriches all commits, etc.

if I fetch the data once and I fetch it again, then are the items going to add up or just cover the old data?

The items are going to add up.

Don't hesitate to write if something isn't clear, thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/chaoss/grimoirelab-sirmordred/issues/462*issuecomment-619360994__;Iw!!Mih3wA!SY1TORBKPOCANk3wdR0it2mO3I1YpVxZYMx8y8KwF-9iBwMYnLtoz0Z-uZeiAA$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKUX662QJMVJSQ5CCQI5OPTROK6YNANCNFSM4MQUC5SQ__;!!Mih3wA!SY1TORBKPOCANk3wdR0it2mO3I1YpVxZYMx8y8KwF-9iBwMYnLtoz0Zx8cjWSA$ .

valeriocos commented 4 years ago

You're welcome @Julianbaozi !

The --from-date cannot be passed via the projects.json. It can be declared in the setup.cfg and it is applied to all repositories.

Please note that:

Julianbaozi commented 4 years ago

Are you talking about kibiter-time-from? I don’t see any other date related parameters in setup.cfg.

And if I’m not visualizing the data. I just need the enriched index. Will it still Work?

On Sat, Apr 25, 2020 at 5:48 AM valerio notifications@github.com wrote:

You're welcome @Julianbaozi https://urldefense.com/v3/__https://github.com/Julianbaozi__;!!Mih3wA!S4ulvwhiarx57uEV_tvpEt3NnZV99aMQQYBQZoJFwTHgqOaiqqZ4qhfqwjO3Eg$ !

The --from-date cannot be passed via the projects.json. It can be declared in the setup.cfg and it is applied to all repositories.

Please note that:

  • the platform automatically derives the from-date value by leveraging on the date of the last item fetched per repository.
  • if the from-date is set, all executions will fetch the data after this date

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/chaoss/grimoirelab-sirmordred/issues/462*issuecomment-619374613__;Iw!!Mih3wA!S4ulvwhiarx57uEV_tvpEt3NnZV99aMQQYBQZoJFwTHgqOaiqqZ4qheYhF_mGQ$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKUX666LVD3DQATULY4PS6DROLLX5ANCNFSM4MQUC5SQ__;!!Mih3wA!S4ulvwhiarx57uEV_tvpEt3NnZV99aMQQYBQZoJFwTHgqOaiqqZ4qhe09mDF4A$ .

valeriocos commented 4 years ago

Are you talking about kibiter-time-from? I don’t see any other date related parameters in setup.cfg.

Most backend sections support the from-date param. It can be set in the following way:

[git]
raw_index = git_raw
enriched_index = git_enriched
...
from_date = 2020-01-01

And if I’m not visualizing the data. I just need the enriched index. Will it still Work?

Yes, because you will collect only the data after a given date.

If this helps, in case you are building a dataset from few grimoirelab's executions, you could collect the data without using the from-date, and then clean the elasticsearch enriched indexes with some ad-hoc queries.

References for this kind of queries is at https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html.

The example below deletes the documents of the repo https://gitlab.com/Bitergia/devops with values 623, 624, 625 for the attribute id_in_repo.

POST gitlab-issues_enriched/_delete_by_query?refresh
{
  "query": {
    "bool": {
      "must": {
        "term": {
          "origin": "https://gitlab.com/Bitergia/devops"
        }
      },
      "filter": {
        "terms": {
          "id_in_repo": [
            623, 624, 625
          ]
        }
      }
    }
  }
}
Julianbaozi commented 4 years ago

How can I get the repo data, such as watches, forks and stars, by date?

On Sat, Apr 25, 2020 at 6:57 AM valerio notifications@github.com wrote:

Are you talking about kibiter-time-from? I don’t see any other date related parameters in setup.cfg.

Most backend sections support the from-date param. It can be set in the following way:

[git] raw_index = git_raw enriched_index = git_enriched ... from_date = 2020-01-01

And if I’m not visualizing the data. I just need the enriched index. Will it still Work?

Yes, because you will collect only the data after a given date.

If this helps, in case you are building a dataset from few grimoirelab's executions, you could collect the data without using the from-date, and then clean the elasticsearch enriched indexes with some ad-hoc queries.

References for this kind of queries is at https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html https://urldefense.com/v3/__https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html__;!!Mih3wA!TfnLJpp-obDmxGf5sFpFSr2L_cn2hNagxwRuj166CjaHzFMgoU0RcLAIvonE8A$ .

The example below deletes the documents of the repo https://gitlab.com/Bitergia/devops https://urldefense.com/v3/__https://gitlab.com/Bitergia/devops__;!!Mih3wA!TfnLJpp-obDmxGf5sFpFSr2L_cn2hNagxwRuj166CjaHzFMgoU0RcLDuYDZmSw$ with values 623, 624, 625 for the attribute id_in_repo.

POST gitlab-issues_enriched/_delete_by_query?refresh { "query": { "bool": { "must": { "term": { "origin": "https://gitlab.com/Bitergia/devops https://urldefense.com/v3/__https://gitlab.com/Bitergia/devops__;!!Mih3wA!TfnLJpp-obDmxGf5sFpFSr2L_cn2hNagxwRuj166CjaHzFMgoU0RcLDuYDZmSw$" } }, "filter": { "terms": { "id_in_repo": [ 623, 624, 625 ] } } } } }

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/chaoss/grimoirelab-sirmordred/issues/462*issuecomment-619383706__;Iw!!Mih3wA!TfnLJpp-obDmxGf5sFpFSr2L_cn2hNagxwRuj166CjaHzFMgoU0RcLAjzGXjWw$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKUX666KC4MMI3LILJZMG43ROLT57ANCNFSM4MQUC5SQ__;!!Mih3wA!TfnLJpp-obDmxGf5sFpFSr2L_cn2hNagxwRuj166CjaHzFMgoU0RcLA4enNkMQ$ .

Julianbaozi commented 4 years ago

Thank you! It works fine now. But when I check the log, it seems that it cannot set aliases. Will it affect the result?

2020-04-25 04:47:39,944 - urllib3.connectionpool - DEBUG - http://localhost:9200 "GET /github-repo_raw/_alias HTTP/1.1" 200 59 2020-04-25 04:47:39,946 - urllib3.connectionpool - DEBUG - http://localhost:9200 "POST /git_raw/_search HTTP/1.1" 200 160 2020-04-25 04:47:39,947 - grimoire_elk.raw.elastic - INFO - [Git] Incremental from: 2020-04-25 11:39:55+00:00 for https://github.com/openssl/openssl.git 2020-04-25 04:47:39,947 - urllib3.connectionpool - DEBUG - http://localhost:9200 "POST /_aliases HTTP/1.1" 400 204 2020-04-25 04:47:39,947 - grimoire_elk.elastic - WARNING - Something went wrong when adding an alias on http://localhost:9200/github-repo_raw. Alias not set. 400 Client Error: Bad Request for url: http://localhost:9200/_aliases

On Sat, Apr 25, 2020 at 6:57 AM valerio notifications@github.com wrote:

Are you talking about kibiter-time-from? I don’t see any other date related parameters in setup.cfg.

Most backend sections support the from-date param. It can be set in the following way:

[git] raw_index = git_raw enriched_index = git_enriched ... from_date = 2020-01-01

And if I’m not visualizing the data. I just need the enriched index. Will it still Work?

Yes, because you will collect only the data after a given date.

If this helps, in case you are building a dataset from few grimoirelab's executions, you could collect the data without using the from-date, and then clean the elasticsearch enriched indexes with some ad-hoc queries.

References for this kind of queries is at https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html https://urldefense.com/v3/__https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html__;!!Mih3wA!TfnLJpp-obDmxGf5sFpFSr2L_cn2hNagxwRuj166CjaHzFMgoU0RcLAIvonE8A$ .

The example below deletes the documents of the repo https://gitlab.com/Bitergia/devops https://urldefense.com/v3/__https://gitlab.com/Bitergia/devops__;!!Mih3wA!TfnLJpp-obDmxGf5sFpFSr2L_cn2hNagxwRuj166CjaHzFMgoU0RcLDuYDZmSw$ with values 623, 624, 625 for the attribute id_in_repo.

POST gitlab-issues_enriched/_delete_by_query?refresh { "query": { "bool": { "must": { "term": { "origin": "https://gitlab.com/Bitergia/devops https://urldefense.com/v3/__https://gitlab.com/Bitergia/devops__;!!Mih3wA!TfnLJpp-obDmxGf5sFpFSr2L_cn2hNagxwRuj166CjaHzFMgoU0RcLDuYDZmSw$" } }, "filter": { "terms": { "id_in_repo": [ 623, 624, 625 ] } } } } }

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/chaoss/grimoirelab-sirmordred/issues/462*issuecomment-619383706__;Iw!!Mih3wA!TfnLJpp-obDmxGf5sFpFSr2L_cn2hNagxwRuj166CjaHzFMgoU0RcLAjzGXjWw$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKUX666KC4MMI3LILJZMG43ROLT57ANCNFSM4MQUC5SQ__;!!Mih3wA!TfnLJpp-obDmxGf5sFpFSr2L_cn2hNagxwRuj166CjaHzFMgoU0RcLA4enNkMQ$ .

Julianbaozi commented 4 years ago

And I got 6 hits in github-repo_raw, but they look the same.

On Sat, Apr 25, 2020 at 6:57 AM valerio notifications@github.com wrote:

Are you talking about kibiter-time-from? I don’t see any other date related parameters in setup.cfg.

Most backend sections support the from-date param. It can be set in the following way:

[git] raw_index = git_raw enriched_index = git_enriched ... from_date = 2020-01-01

And if I’m not visualizing the data. I just need the enriched index. Will it still Work?

Yes, because you will collect only the data after a given date.

If this helps, in case you are building a dataset from few grimoirelab's executions, you could collect the data without using the from-date, and then clean the elasticsearch enriched indexes with some ad-hoc queries.

References for this kind of queries is at https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html https://urldefense.com/v3/__https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html__;!!Mih3wA!TfnLJpp-obDmxGf5sFpFSr2L_cn2hNagxwRuj166CjaHzFMgoU0RcLAIvonE8A$ .

The example below deletes the documents of the repo https://gitlab.com/Bitergia/devops https://urldefense.com/v3/__https://gitlab.com/Bitergia/devops__;!!Mih3wA!TfnLJpp-obDmxGf5sFpFSr2L_cn2hNagxwRuj166CjaHzFMgoU0RcLDuYDZmSw$ with values 623, 624, 625 for the attribute id_in_repo.

POST gitlab-issues_enriched/_delete_by_query?refresh { "query": { "bool": { "must": { "term": { "origin": "https://gitlab.com/Bitergia/devops https://urldefense.com/v3/__https://gitlab.com/Bitergia/devops__;!!Mih3wA!TfnLJpp-obDmxGf5sFpFSr2L_cn2hNagxwRuj166CjaHzFMgoU0RcLDuYDZmSw$" } }, "filter": { "terms": { "id_in_repo": [ 623, 624, 625 ] } } } } }

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/chaoss/grimoirelab-sirmordred/issues/462*issuecomment-619383706__;Iw!!Mih3wA!TfnLJpp-obDmxGf5sFpFSr2L_cn2hNagxwRuj166CjaHzFMgoU0RcLAjzGXjWw$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKUX666KC4MMI3LILJZMG43ROLT57ANCNFSM4MQUC5SQ__;!!Mih3wA!TfnLJpp-obDmxGf5sFpFSr2L_cn2hNagxwRuj166CjaHzFMgoU0RcLA4enNkMQ$ .

valeriocos commented 4 years ago

How can I get the repo data, such as watches, forks and stars, by date?

setup.cfg

[github:repo]
raw_index = github-repo_raw
enriched_index = github-repo_enriched
api-token = xxxx
category = repository
sleep-for-rate = true
no-archive = true (suggested)

projects.json

{
    "Chaoss": {
        "github:repo": [
            "https:/github.com/chaoss/grimoirelab-perceval",
            "https:/github.com/chaoss/grimoirelab-sirmordred"
        ]
    }
}

info at: https://github.com/chaoss/grimoirelab-sirmordred#repo

In case you want to see the github repos data in the dashboard, you need to set the flag github-repos to true in the panels section: https://github.com/chaoss/grimoirelab-sirmordred#panels

Thank you! It works fine now. But when I check the log, it seems that it cannot set aliases. Will it affect the result?

You're welcome! No it won't, the aliases are used to visualize the data in the dashboard. Btw, that error shouldn't be there, which version of mordred are you using?

And I got 6 hits in github-repo_raw, but they look the same.

The GitHub API doesn't provide a way to fetch historical data about watchers and forks. For each execution grimoirelab fetches the data from the endpoint repository (https://developer.github.com/v3/repos/#get-a-repository) and collects the number of stars and watchers in that precise moment.

In your case, the 6 hits look the same because there were no new stars/watchers during the 6 executions.

EDIT: If you want to collect the historical data of stars and watchers for some repos, you could get the data from https://www.gharchive.org/. Some events come with the repo info at that precise moment. For instance, if you focus on the push events of a target repo (https://developer.github.com/v3/activity/events/types/#pushevent), you should be able to collect what you need.

It would be worthy to evaluate how this approach could be implemented for a new backend for Perceval.

Julianbaozi commented 4 years ago

What is github:event and githubql? Are they related the forks and watchers?

On Sun, Apr 26, 2020 at 12:56 AM valerio notifications@github.com wrote:

How can I get the repo data, such as watches, forks and stars, by date?

setup.cfg

[github:repo] raw_index = github-repo_raw enriched_index = github-repo_enriched api-token = xxxx category = repository sleep-for-rate = true no-archive = true (suggested)

projects.json

{ "Chaoss": { "github:repo": [ "https:/github.com/chaoss/grimoirelab-perceval https://urldefense.com/v3/__https://github.com/chaoss/grimoirelab-perceval__;!!Mih3wA!WKVpsTZU8REX76Y7_zpyYNvZQWCOpstCFOIcdvjFuOiNVtqJI7F6AHDWPqJbgw$", "https:/github.com/chaoss/grimoirelab-sirmordred https://urldefense.com/v3/__https://github.com/chaoss/grimoirelab-sirmordred__;!!Mih3wA!WKVpsTZU8REX76Y7_zpyYNvZQWCOpstCFOIcdvjFuOiNVtqJI7F6AHAX-ddMcw$" ] } }

info at: https://github.com/chaoss/grimoirelab-sirmordred#repo https://urldefense.com/v3/__https://github.com/chaoss/grimoirelab-sirmordred*repo__;Iw!!Mih3wA!WKVpsTZU8REX76Y7_zpyYNvZQWCOpstCFOIcdvjFuOiNVtqJI7F6AHAh4sHxzw$

In case you want to see the github repos data in the dashboard, you need to set the flag github-repos to true in the panels section: https://github.com/chaoss/grimoirelab-sirmordred#panels https://urldefense.com/v3/__https://github.com/chaoss/grimoirelab-sirmordred*panels__;Iw!!Mih3wA!WKVpsTZU8REX76Y7_zpyYNvZQWCOpstCFOIcdvjFuOiNVtqJI7F6AHAWVoH_gA$

Thank you! It works fine now. But when I check the log, it seems that it cannot set aliases. Will it affect the result?

You're welcome! No it won't, the aliases are used to visualize the data in the dashboard. Btw, that error shouldn't be there, which version of mordred are you using?

And I got 6 hits in github-repo_raw, but they look the same.

The GitHub API doesn't provide a way to fetch historical data about watchers and forks. For each execution grimoirelab fetches the data from the endpoint repository ( https://developer.github.com/v3/repos/#get-a-repository https://urldefense.com/v3/__https://developer.github.com/v3/repos/*get-a-repository__;Iw!!Mih3wA!WKVpsTZU8REX76Y7_zpyYNvZQWCOpstCFOIcdvjFuOiNVtqJI7F6AHCUM4YULA$) and collects the number of stars and watchers in that precise moment.

In your case, the 6 hits look the same because there were no new stars/watchers during the 6 executions.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/chaoss/grimoirelab-sirmordred/issues/462*issuecomment-619504613__;Iw!!Mih3wA!WKVpsTZU8REX76Y7_zpyYNvZQWCOpstCFOIcdvjFuOiNVtqJI7F6AHBxKsN38Q$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKUX662KLXMMNFAQORWF4RLROPSKXANCNFSM4MQUC5SQ__;!!Mih3wA!WKVpsTZU8REX76Y7_zpyYNvZQWCOpstCFOIcdvjFuOiNVtqJI7F6AHBqS66vpg$ .

valeriocos commented 4 years ago

Github:event is related to fetching events for issues (e.g., close, reference, labeling events). It isn't related to forks and watchers.

Julianbaozi commented 4 years ago

I update grimoirelab, and sirmordred does't work. No matter what parameter I input, as long as it starts with grimoirelab, it paused and nothing happens. No log, no output, no anything. And then I have to interrupt it, and got this:

KeyboardInterrupt File "/usr/lib/python3.6/subprocess.py", line 855, in communicate self.wait() File "/usr/lib/python3.6/subprocess.py", line 1477, in wait (pid, sts) = self._try_wait(0) File "/usr/lib/python3.6/subprocess.py", line 425, in run stdout, stderr = process.communicate(input, timeout=timeout) File "/usr/lib/python3.6/subprocess.py", line 855, in communicate self.wait() File "/usr/lib/python3.6/subprocess.py", line 1424, in _try_wait (pid, sts) = os.waitpid(self.pid, wait_flags) File "/usr/lib/python3.6/subprocess.py", line 1424, in _try_wait (pid, sts) = os.waitpid(self.pid, wait_flags) KeyboardInterrupt KeyboardInterrupt (pid, sts) = os.waitpid(self.pid, wait_flags) KeyboardInterrupt (pid, sts) = os.waitpid(self.pid, wait_flags) KeyboardInterrupt File "/usr/lib/python3.6/subprocess.py", line 1477, in wait (pid, sts) = self._try_wait(0) File "/usr/lib/python3.6/subprocess.py", line 1424, in _try_wait (pid, sts) = os.waitpid(self.pid, wait_flags) KeyboardInterrupt File "/usr/lib/python3.6/subprocess.py", line 1424, in _try_wait (pid, sts) = os.waitpid(self.pid, wait_flags) KeyboardInterrupt KeyboardInterrupt

It's just part of the very long piece of information which looks very repetitive.

And p2y.py is working fine.

On Mon, Apr 27, 2020 at 12:19 AM valerio notifications@github.com wrote:

Github:event is related to fetching events for issues (e.g., close, reference, labeling events). It isn't related to forks and watchers.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/chaoss/grimoirelab-sirmordred/issues/462*issuecomment-619782807__;Iw!!Mih3wA!QXgcJ7VfOrMFJMxaXNSr9gXb6uXWz2_oevlde5HBBMtL-F_scCpHsPSfqjKeXg$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKUX6633LMSVZIRVDWYFT53ROUWYPANCNFSM4MQUC5SQ__;!!Mih3wA!QXgcJ7VfOrMFJMxaXNSr9gXb6uXWz2_oevlde5HBBMtL-F_scCpHsPSeBTwsZg$ .

Julianbaozi commented 4 years ago

Ok it's because in the folder there is another sirmordred.py. Solved.

On Mon, Apr 27, 2020 at 1:13 AM Junliang Yu yujl@ucsd.edu wrote:

I update grimoirelab, and sirmordred does't work. No matter what parameter I input, as long as it starts with grimoirelab, it paused and nothing happens. No log, no output, no anything. And then I have to interrupt it, and got this:

KeyboardInterrupt File "/usr/lib/python3.6/subprocess.py", line 855, in communicate self.wait() File "/usr/lib/python3.6/subprocess.py", line 1477, in wait (pid, sts) = self._try_wait(0) File "/usr/lib/python3.6/subprocess.py", line 425, in run stdout, stderr = process.communicate(input, timeout=timeout) File "/usr/lib/python3.6/subprocess.py", line 855, in communicate self.wait() File "/usr/lib/python3.6/subprocess.py", line 1424, in _try_wait (pid, sts) = os.waitpid(self.pid, wait_flags) File "/usr/lib/python3.6/subprocess.py", line 1424, in _try_wait (pid, sts) = os.waitpid(self.pid, wait_flags) KeyboardInterrupt KeyboardInterrupt (pid, sts) = os.waitpid(self.pid, wait_flags) KeyboardInterrupt (pid, sts) = os.waitpid(self.pid, wait_flags) KeyboardInterrupt File "/usr/lib/python3.6/subprocess.py", line 1477, in wait (pid, sts) = self._try_wait(0) File "/usr/lib/python3.6/subprocess.py", line 1424, in _try_wait (pid, sts) = os.waitpid(self.pid, wait_flags) KeyboardInterrupt File "/usr/lib/python3.6/subprocess.py", line 1424, in _try_wait (pid, sts) = os.waitpid(self.pid, wait_flags) KeyboardInterrupt KeyboardInterrupt

It's just part of the very long piece of information which looks very repetitive.

And p2y.py is working fine.

On Mon, Apr 27, 2020 at 12:19 AM valerio notifications@github.com wrote:

Github:event is related to fetching events for issues (e.g., close, reference, labeling events). It isn't related to forks and watchers.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/chaoss/grimoirelab-sirmordred/issues/462*issuecomment-619782807__;Iw!!Mih3wA!QXgcJ7VfOrMFJMxaXNSr9gXb6uXWz2_oevlde5HBBMtL-F_scCpHsPSfqjKeXg$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKUX6633LMSVZIRVDWYFT53ROUWYPANCNFSM4MQUC5SQ__;!!Mih3wA!QXgcJ7VfOrMFJMxaXNSr9gXb6uXWz2_oevlde5HBBMtL-F_scCpHsPSeBTwsZg$ .

valeriocos commented 4 years ago

I update grimoirelab, and sirmordred does't work.

Are you using the docker-compose solution (https://github.com/chaoss/grimoirelab#using-docker-compose) or executing mordred from source code?

The logs above don't have pointers to the grimoirelab codebase, can you share the following information?

Thanks!

valeriocos commented 4 years ago

Ok it's because in the folder there is another sirmordred.py. Solved.

Ok, sorry I didn't see your last message

Julianbaozi commented 4 years ago

Seems like there is a encoding problem.

/home/yjl/venvs/gl/lib/python3.6/site-packages/pymysql/cursors.py:170: Warning: (1366, "Incorrect string value: '\xE9d\xE9ric...' for column 'name' at row 1") result = self._query(query) --- Logging error --- Traceback (most recent call last): File "/usr/lib/python3.6/logging/init.py", line 996, in emit stream.write(msg) UnicodeEncodeError: 'utf-8' codec can't encode character '\udce9' in position 146: surrogates not allowed Call stack: File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap self._bootstrap_inner() File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/home/yjl/venvs/gl/lib/python3.6/site-packages/sirmordred/task_manager.py", line 97, in run task.execute() File "/home/yjl/venvs/gl/lib/python3.6/site-packages/sirmordred/task_enrich.py", line 404, in execute self.enrich_items() File "/home/yjl/venvs/gl/lib/python3.6/site-packages/sirmordred/task_enrich.py", line 203, in enrich_items repo_labels=repo_labels) File "/home/yjl/venvs/gl/lib/python3.6/site-packages/grimoire_elk/elk.py", line 657, in enrich_backend total_ids = load_identities(ocean_backend, enrich_backend) File "/home/yjl/venvs/gl/lib/python3.6/site-packages/grimoire_elk/elk.py", line 423, in load_identities enrich_backend.get_connector_name()) File "/home/yjl/venvs/gl/lib/python3.6/site-packages/grimoire_elk/elk.py", line 440, in load_bulk_identities SortingHat.add_identities(sh_db, new_identities, connector_name) File "/home/yjl/venvs/gl/lib/python3.6/site-packages/grimoire_elk/enriched/sortinghat_gelk.py", line 113, in add_identities cls.add_identity(db, identity, backend) File "/home/yjl/venvs/gl/lib/python3.6/site-packages/grimoire_elk/enriched/sortinghat_gelk.py", line 72, in add_identity uuid, identity['username'], identity['name'], identity['email']) Message: 'New sortinghat identity %s %s,%s,%s ' Arguments: ('a021dbfecf00912916d9ff57cbfb6a232a15c528', None, 'Fr\udce9d\udce9ric Giudicelli', 'groups@newpki.org')

On Mon, Apr 27, 2020 at 1:22 AM valerio notifications@github.com wrote:

Ok it's because in the folder there is another sirmordred.py. Solved.

Ok, sorry I didn't see your last message

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/chaoss/grimoirelab-sirmordred/issues/462*issuecomment-619816366__;Iw!!Mih3wA!Tq0dZi1E_TSWC2EmJL9y8ze3B7thiC2P8ivnlSzV43XRfhT3bpTnXjDALKxc6g$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKUX665JLO7T3D2T2W7JTBLROU6DPANCNFSM4MQUC5SQ__;!!Mih3wA!Tq0dZi1E_TSWC2EmJL9y8ze3B7thiC2P8ivnlSzV43XRfhT3bpTnXjBn9gnjng$ .

valeriocos commented 4 years ago

It's a warning that is thrown at https://github.com/chaoss/grimoirelab-elk/blob/master/grimoire_elk/enriched/sortinghat_gelk.py#L72. The identity 'Fr\udce9d\udce9ric Giudicelli' isn't inserted due to a unicode error UnicodeEncodeError.

If you pass me the repo where this error occurs, I can try to replicate it locally. I guess a possible fix could be to change the encoding used in the MySQL db

$ mysql -u root -p --host 127.0.0.1
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 5
Server version: 5.5.5-10.0.38-MariaDB-1~xenial mariadb.org binary distribution

Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show variables like 'char%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | latin1                     | <---- If this is utf8, I guess the problem is gone
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
Julianbaozi commented 4 years ago

The repo is openssl/openssl

On Mon, Apr 27, 2020 at 2:16 AM valerio notifications@github.com wrote:

It's a warning that is thrown at https://github.com/chaoss/grimoirelab-elk/blob/master/grimoire_elk/enriched/sortinghat_gelk.py#L72 https://urldefense.com/v3/__https://github.com/chaoss/grimoirelab-elk/blob/master/grimoire_elk/enriched/sortinghat_gelk.py*L72__;Iw!!Mih3wA!S8Dx_52IDxyV5dPQmVs0e-oWdxpj2bx9-pt1OaAZ4qBS0DCLwHbBC4Xear-vbg$. The identity 'Fr\udce9d\udce9ric Giudicelli' isn't inserted due to a unicode error UnicodeEncodeError.

If you pass me the repo where this error occurs, I can try to replicate it locally. I guess a possible fix could be to change the encoding used in the MySQL db

$ mysql -u root -p --host 127.0.0.1 Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 5 Server version: 5.5.5-10.0.38-MariaDB-1~xenial mariadb.org binary distribution

Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show variables like 'char%'; +--------------------------+----------------------------+ | Variable_name | Value | +--------------------------+----------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | latin1 | <---- If this is utf8, I guess the problem is gone | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ | +--------------------------+----------------------------+

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/chaoss/grimoirelab-sirmordred/issues/462*issuecomment-619848085__;Iw!!Mih3wA!S8Dx_52IDxyV5dPQmVs0e-oWdxpj2bx9-pt1OaAZ4qBS0DCLwHbBC4VIW-JwhA$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKUX66Z44FC7V54SJF3ANFDROVEP5ANCNFSM4MQUC5SQ__;!!Mih3wA!S8Dx_52IDxyV5dPQmVs0e-oWdxpj2bx9-pt1OaAZ4qBS0DCLwHbBC4Ukws7lxg$ .

Julianbaozi commented 4 years ago

How to query number of lines? is it added - removed? How to query number of files? Thank you!

On Mon, Apr 27, 2020 at 2:32 AM Junliang Yu yujl@ucsd.edu wrote:

The repo is openssl/openssl

On Mon, Apr 27, 2020 at 2:16 AM valerio notifications@github.com wrote:

It's a warning that is thrown at https://github.com/chaoss/grimoirelab-elk/blob/master/grimoire_elk/enriched/sortinghat_gelk.py#L72 https://urldefense.com/v3/__https://github.com/chaoss/grimoirelab-elk/blob/master/grimoire_elk/enriched/sortinghat_gelk.py*L72__;Iw!!Mih3wA!S8Dx_52IDxyV5dPQmVs0e-oWdxpj2bx9-pt1OaAZ4qBS0DCLwHbBC4Xear-vbg$. The identity 'Fr\udce9d\udce9ric Giudicelli' isn't inserted due to a unicode error UnicodeEncodeError.

If you pass me the repo where this error occurs, I can try to replicate it locally. I guess a possible fix could be to change the encoding used in the MySQL db

$ mysql -u root -p --host 127.0.0.1 Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 5 Server version: 5.5.5-10.0.38-MariaDB-1~xenial mariadb.org binary distribution

Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show variables like 'char%'; +--------------------------+----------------------------+ | Variable_name | Value | +--------------------------+----------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | latin1 | <---- If this is utf8, I guess the problem is gone | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ | +--------------------------+----------------------------+

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/chaoss/grimoirelab-sirmordred/issues/462*issuecomment-619848085__;Iw!!Mih3wA!S8Dx_52IDxyV5dPQmVs0e-oWdxpj2bx9-pt1OaAZ4qBS0DCLwHbBC4VIW-JwhA$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKUX66Z44FC7V54SJF3ANFDROVEP5ANCNFSM4MQUC5SQ__;!!Mih3wA!S8Dx_52IDxyV5dPQmVs0e-oWdxpj2bx9-pt1OaAZ4qBS0DCLwHbBC4Ukws7lxg$ .

Julianbaozi commented 4 years ago

Hi,

For onion analysis, how to set the project name? [image: Screenshot from 2020-04-27 17-03-32.png] It's Global.

On Mon, Apr 27, 2020 at 5:37 AM Junliang Yu yujl@ucsd.edu wrote:

How to query number of lines? is it added - removed? How to query number of files? Thank you!

On Mon, Apr 27, 2020 at 2:32 AM Junliang Yu yujl@ucsd.edu wrote:

The repo is openssl/openssl

On Mon, Apr 27, 2020 at 2:16 AM valerio notifications@github.com wrote:

It's a warning that is thrown at https://github.com/chaoss/grimoirelab-elk/blob/master/grimoire_elk/enriched/sortinghat_gelk.py#L72 https://urldefense.com/v3/__https://github.com/chaoss/grimoirelab-elk/blob/master/grimoire_elk/enriched/sortinghat_gelk.py*L72__;Iw!!Mih3wA!S8Dx_52IDxyV5dPQmVs0e-oWdxpj2bx9-pt1OaAZ4qBS0DCLwHbBC4Xear-vbg$. The identity 'Fr\udce9d\udce9ric Giudicelli' isn't inserted due to a unicode error UnicodeEncodeError.

If you pass me the repo where this error occurs, I can try to replicate it locally. I guess a possible fix could be to change the encoding used in the MySQL db

$ mysql -u root -p --host 127.0.0.1 Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 5 Server version: 5.5.5-10.0.38-MariaDB-1~xenial mariadb.org binary distribution

Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show variables like 'char%'; +--------------------------+----------------------------+ | Variable_name | Value | +--------------------------+----------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | latin1 | <---- If this is utf8, I guess the problem is gone | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ | +--------------------------+----------------------------+

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/chaoss/grimoirelab-sirmordred/issues/462*issuecomment-619848085__;Iw!!Mih3wA!S8Dx_52IDxyV5dPQmVs0e-oWdxpj2bx9-pt1OaAZ4qBS0DCLwHbBC4VIW-JwhA$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKUX66Z44FC7V54SJF3ANFDROVEP5ANCNFSM4MQUC5SQ__;!!Mih3wA!S8Dx_52IDxyV5dPQmVs0e-oWdxpj2bx9-pt1OaAZ4qBS0DCLwHbBC4Ukws7lxg$ .

valeriocos commented 4 years ago

https://github.com/chaoss/grimoirelab-sirmordred/issues/462#issuecomment-619857643

I modified the docker-compose here to set utf8 and utf8mb4 to MySQL, but the error is always there:

mariadb:
  restart: on-failure:5
  image: mariadb:10.0
  expose:
    - "3306"
  ports:
    - "3306:3306"
  environment:
    - MYSQL_ROOT_PASSWORD=
    - MYSQL_ALLOW_EMPTY_PASSWORD=yes
    - MYSQL_DATABASE=test_sh
  command: --wait_timeout=2592000 --interactive_timeout=2592000 --max_connections=300 --character-set-server=utf8 --collation-server=utf8_unicode_ci --character-set-client-handshake=false
  log_driver: "json-file"
  log_opt:
      max-size: "100m"
      max-file: "3"

Feel free to investigate on that, please consider to add a Q&A in the How to section https://github.com/chaoss/grimoirelab-sirmordred/blob/master/Getting-Started.md#how-to-

https://github.com/chaoss/grimoirelab-sirmordred/issues/462#issuecomment-619959251

You can query the areas of code (aka aoc) index using python ad-hoc libraries like elasticsearch-dsl or low-level libraries like requests. The schema of the index is at https://github.com/chaoss/grimoirelab-elk/blob/master/schema/areas_of_code.csv. Details about the ElasticSearch query language is at https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html.

https://github.com/chaoss/grimoirelab-sirmordred/issues/462#issuecomment-620299276

The screenshot isn't visible in your comment.

The project name is set in the projects.json here: https://github.com/chaoss/grimoirelab-sirmordred/blob/master/utils/projects.json#L2.

If this resolves your question, could you please add a Q&A in the How to section https://github.com/chaoss/grimoirelab-sirmordred/blob/master/Getting-Started.md#how-to- ?

Julianbaozi commented 4 years ago

sure i'd love to. Screenshot from 2020-04-28 02-19-47

in http://localhost:9200/git-onion_enriched/_search, I tried to find anything that can represent the project. But the field 'project' is 'Global'.

valeriocos commented 4 years ago

global_ is set here: https://github.com/chaoss/grimoirelab-elk/blob/ec6694fa850f81a0f1f36440c246429bfdcc35d1/grimoire_elk/enriched/study_ceres_onion.py#L343

Can you share your projects.json so I can replicate this behavior locally?

Julianbaozi commented 4 years ago

{"openssl/openssl": {"git": ["https://github.com/openssl/openssl.git"], "github:repo": ["https://github.com/openssl/openssl"], "github:issue": ["https://github.com/openssl/openssl"]}}

Thank you

valeriocos commented 4 years ago

Sorry for the late reply @Julianbaozi!

I have run mordred with your projects.json and I can see that in the onion index there are different hits with project equal to openssl/openssl.

      {
        "_index": "git-onion_openssl_enriched_181221",
        "_type": "item",
        "_id": "_global__openssl/openssl_1999-07-01t00:00:00.000z_388f79a4ef7812e0ec085f5bc9887b822cd41be1",
        "_score": 1,
        "_source": {
          "timeframe": "1999-07-01T00:00:00.000Z",
          "author_uuid": "388f79a4ef7812e0ec085f5bc9887b822cd41be1",
          "author_name": "Ralf S. Engelschall",
          "contributions": 14,
          "metadata__timestamp": "2020-04-29T06:03:40.423Z",
          "project": "openssl/openssl", <---
          "author_org_name": "_Global_",
          "cum_net_sum": 226,
          "percent_cum_net_sum": 97.41379310344827,
          "onion_role": "casual",
          "quarter": "1999Q3",
          "metadata__enriched_on": "2020-04-29T06:25:59.595210",
          "data_source": "git",
          "grimoire_creation_date": "1999-07-01T00:00:00.000Z"
        }
      },

This information is shown in the dashboard Community-Structure-by-Project (see an example at https://chaoss.biterg.io/goto/a3024d4fc930785277594bca75f4b017).

If you want to retrieve from the index only the hits with project = openssl/openssl, try this query in the dev tools of Kibiter or with curl:

GET <name-of-your-onion-index>/_search
{
  "query": {
    "terms": {
      "project": [
        "openssl/openssl"
      ]
    }
  }
}

Let me know how it goes!

Julianbaozi commented 4 years ago

What does 'size' in github-repo_raw mean?

On Tue, Apr 28, 2020 at 11:51 PM Junliang Yu yujl@ucsd.edu wrote:

Then where do other projects come from if i’m only testing openssl?

Thank you!

On Tue, Apr 28, 2020 at 11:46 PM valerio notifications@github.com wrote:

Sorry for the late reply @Julianbaozi https://urldefense.com/v3/__https://github.com/Julianbaozi__;!!Mih3wA!UYnQgQhO51S__7IGfDtloNIU8bn8egPvzeX_Z_Q7LsNqjv3WAQnSDF5L_GYTcQ$ !

I have run mordred with your projects.json and I can see that in the onion index there are different hits with project equal to openssl/openssl.

  {
    "_index": "git-onion_openssl_enriched_181221",
    "_type": "item",
    "_id": "_global__openssl/openssl_1999-07-01t00:00:00.000z_388f79a4ef7812e0ec085f5bc9887b822cd41be1",
    "_score": 1,
    "_source": {
      "timeframe": "1999-07-01T00:00:00.000Z",
      "author_uuid": "388f79a4ef7812e0ec085f5bc9887b822cd41be1",
      "author_name": "Ralf S. Engelschall",
      "contributions": 14,
      "metadata__timestamp": "2020-04-29T06:03:40.423Z",
      "project": "openssl/openssl", <---
      "author_org_name": "_Global_",
      "cum_net_sum": 226,
      "percent_cum_net_sum": 97.41379310344827,
      "onion_role": "casual",
      "quarter": "1999Q3",
      "metadata__enriched_on": "2020-04-29T06:25:59.595210",
      "data_source": "git",
      "grimoire_creation_date": "1999-07-01T00:00:00.000Z"
    }
  },

This information is shown in the dashboard Community-Structure-by-Project (see an example at https://chaoss.biterg.io/goto/a3024d4fc930785277594bca75f4b017 https://urldefense.com/v3/__https://chaoss.biterg.io/goto/a3024d4fc930785277594bca75f4b017__;!!Mih3wA!UYnQgQhO51S__7IGfDtloNIU8bn8egPvzeX_Z_Q7LsNqjv3WAQnSDF4mPSWj1w$ ).

If you want to retrieve from the index only the hits with project = openssl/openssl, try this query in the dev tools of Kibiter or with curl:

GET /_search { "query": { "terms": { "project": [ "openssl/openssl" ] } } }

Let me know how it goes!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/chaoss/grimoirelab-sirmordred/issues/462*issuecomment-621020159__;Iw!!Mih3wA!UYnQgQhO51S__7IGfDtloNIU8bn8egPvzeX_Z_Q7LsNqjv3WAQnSDF7WZeXiaw$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKUX663UNKHQ3VZNERIPTWDRO7ELPANCNFSM4MQUC5SQ__;!!Mih3wA!UYnQgQhO51S__7IGfDtloNIU8bn8egPvzeX_Z_Q7LsNqjv3WAQnSDF5aC4Y9PQ$ .

Julianbaozi commented 4 years ago

Why it will be analyzed twice? Screenshot from 2020-04-29 01-16-22 this is my setup:

[general] short_name = grimoirelab update = false debug = true logs_dir = logs aliases_file = sirmordred-settings/aliases.json

[projects] projects_file = sirmordred-settings/projects.json

[es_collection] url = http://localhost:9200

[es_enrichment] url = http://localhost:9200 autorefresh = true

[sortinghat] host = localhost user = root password = database = shdb load_orgs = false orgs_file = sirmordred-settings/organizations.json autoprofile = [github, pipermail, git] matching = [email] sleep_for = 0 unaffiliated_group = Unknown affiliate = true strict_mapping = false reset_on_load = true identities_format = sortinghat

[phases] collection = true identities = true enrichment = true panels = false

[git] raw_index = git_raw enriched_index = git_enriched latest-items = false studies = [enrich_onion:git, enrich_areas_of_code:git]

[enrich_onion:git] in_index = git_enriched out_index = git_onion_enriched contribs_field = hash

[enrich_areas_of_code:git] in_index = git_raw out_index = git_aoc_enriched

[github:repo] raw_index = github_repo_raw enriched_index = github_repo_enriched api-token = 6d8e4953a6b1ada8d07579914f06cb8844725926 category = repository sleep-for-rate = true no-archive = true

[github:issue] raw_index = github_issue_raw enriched_index = github_issue_enriched api-token = 6d8e4953a6b1ada8d07579914f06cb8844725926 sleep-for-rate = true no-archive = true category = issue from-date = 2020-04-26

Julianbaozi commented 4 years ago

Because I'm trying to analyze a huge collection of repos. Will using Arthur accelerate it? And I want to query the index from all the repos, and it will be better if I can query the repo as soon as I finish fetching this repo before getting all the repos fetched. That's why I've been using moredred to deal with repos one by one. So, arthur/no arthur, mordred/ micro mordred, what do you think is the best combination?

a lot of thanks!

valeriocos commented 4 years ago

What does 'size' in github-repo_raw mean?

The size is returned by the GitHub API: https://developer.github.com/v3/repos/#get-a-repository

The size is expressed in kilobytes based on the disk usage of the GitHub server-side bare repository. However, in order to avoid wasting too much space with repositories with a large network, GitHub relies on Git Alternates. In this configuration, calculating the disk usage against the bare repository doesn't account for the shared object store and thus returns an "incomplete" value through the API call.

ref: https://stackoverflow.com/questions/8646517/how-can-i-see-the-size-of-a-github-repository-before-cloning-it

valeriocos commented 4 years ago

https://github.com/chaoss/grimoirelab-sirmordred/issues/462#issuecomment-621058134

Why it will be analyzed twice?

Which version of mordred are you using? I tried with the master branch and the setup.cfg you shared, and I got only one hit:

Collection for git: starting...
Collection for github:issue: starting...
Collection for github:repo: starting...
Loading blacklist...
0/0 blacklist entries loaded
Loading unique identities...
4/4 unique identities loaded
Collection for github:repo: finished after 00:00:01 hours
Collection for github:issue: finished after 00:01:47 hours
Collection for git: finished after 00:02:29 hours
Process finished with exit code 0
valeriocos commented 4 years ago

https://github.com/chaoss/grimoirelab-sirmordred/issues/462#issuecomment-621073137

Because I'm trying to analyze a huge collection of repos. Will using Arthur accelerate it? And I want to query the index from all the repos, and it will be better if I can query the repo as soon as I finish fetching this repo before getting all the repos fetched. That's why I've been using moredred to deal with repos one by one. So, arthur/no arthur, mordred/ micro mordred, what do you think is the best combination?

Sorry for the late reply @Julianbaozi, I missed your last comments.

Mordred seems the good solution, since you can use it as it is. In your mining process, the bottleneck is the time needed to collect issues from GitHub (since the petitions are rate limited). Thus, instead of passing a single API token (api-token = ...), you could pass a list of them (api-token = [.., .., ..]). Please note that the tokens should be generated from different GitHub accounts, otherwise they will share the same petitions.

In case you want to scale your process, you could write a script to launch different instances of micro-mordred. Each one will have is own indexes for git, github, etc.. For each data source (git, github), all the corresponding indexes could share the same alias, so you will be able to perform queries over all the data fetched from the same data source.

Julianbaozi commented 4 years ago

Thank you for your reply. The version of mordred is 0.2.25.

So are you suggesting that I do not use Arthur? Just pass more tokens in the setup file?

And by writing script to launch different instances you mean write a for loop for every project? So the projects are still dealt with one by one? This is what I’m doing now with mordred. What is the difference here between using mordred and micro-mordred if i’m using for loop? The problem is that it cannot give good results on studies like aoc. For the first project it is fine but the second repository will get zero results. To solve this I have to delete all the index every time i finish one project.

Thank you again for you helpful answers.

On Sat, May 2, 2020 at 5:39 AM valerio notifications@github.com wrote:

462 (comment)

https://urldefense.com/v3/__https://github.com/chaoss/grimoirelab-sirmordred/issues/462*issuecomment-621073137__;Iw!!Mih3wA!T2UB-FLrr2Xswaf411kXKg4_0QV0oNC9NnCA6IDt2E16cEiEUNvFMcPG_mOjbw$

Because I'm trying to analyze a huge collection of repos. Will using Arthur accelerate it? And I want to query the index from all the repos, and it will be better if I can query the repo as soon as I finish fetching this repo before getting all the repos fetched. That's why I've been using moredred to deal with repos one by one. So, arthur/no arthur, mordred/ micro mordred, what do you think is the best combination?

Sorry for the late reply @Julianbaozi https://urldefense.com/v3/__https://github.com/Julianbaozi__;!!Mih3wA!T2UB-FLrr2Xswaf411kXKg4_0QV0oNC9NnCA6IDt2E16cEiEUNvFMcOlM2gHwg$, I missed your last comments.

Mordred seems the good solution, since you can use it as it is. In your mining process, the bottleneck is the time needed to collect issues from GitHub (since the petitions are rate limited). Thus, instead of passing a single API token (api-token = ...), you could pass a list of them (api-token = [.., .., ..]). Please note that the tokens should be generated from different GitHub accounts, otherwise they will share the same petitions.

In case you want to scale your process, you could write a script to launch different instances of micro-mordred. Each one will have is own indexes for git, github, etc.. For each data source (git, github), all the corresponding indexes could share the same alias, so you will be able to perform queries over all the data fetched from the same data source.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/chaoss/grimoirelab-sirmordred/issues/462*issuecomment-622947475__;Iw!!Mih3wA!T2UB-FLrr2Xswaf411kXKg4_0QV0oNC9NnCA6IDt2E16cEiEUNvFMcPhebu_pg$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKUX6623N7IAENUT4YAQPP3RPQIA3ANCNFSM4MQUC5SQ__;!!Mih3wA!T2UB-FLrr2Xswaf411kXKg4_0QV0oNC9NnCA6IDt2E16cEiEUNvFMcP4WO-8ow$ .

Julianbaozi commented 4 years ago

And for sortinghat, is the result already merged? When i’m querying the database, it was clear that many people with the same name have different email addresses and uuids. And i saw the identities merge parameter in micro-mordred.

On Sat, May 2, 2020 at 5:39 AM valerio notifications@github.com wrote:

462 (comment)

https://urldefense.com/v3/__https://github.com/chaoss/grimoirelab-sirmordred/issues/462*issuecomment-621073137__;Iw!!Mih3wA!T2UB-FLrr2Xswaf411kXKg4_0QV0oNC9NnCA6IDt2E16cEiEUNvFMcPG_mOjbw$

Because I'm trying to analyze a huge collection of repos. Will using Arthur accelerate it? And I want to query the index from all the repos, and it will be better if I can query the repo as soon as I finish fetching this repo before getting all the repos fetched. That's why I've been using moredred to deal with repos one by one. So, arthur/no arthur, mordred/ micro mordred, what do you think is the best combination?

Sorry for the late reply @Julianbaozi https://urldefense.com/v3/__https://github.com/Julianbaozi__;!!Mih3wA!T2UB-FLrr2Xswaf411kXKg4_0QV0oNC9NnCA6IDt2E16cEiEUNvFMcOlM2gHwg$, I missed your last comments.

Mordred seems the good solution, since you can use it as it is. In your mining process, the bottleneck is the time needed to collect issues from GitHub (since the petitions are rate limited). Thus, instead of passing a single API token (api-token = ...), you could pass a list of them (api-token = [.., .., ..]). Please note that the tokens should be generated from different GitHub accounts, otherwise they will share the same petitions.

In case you want to scale your process, you could write a script to launch different instances of micro-mordred. Each one will have is own indexes for git, github, etc.. For each data source (git, github), all the corresponding indexes could share the same alias, so you will be able to perform queries over all the data fetched from the same data source.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/chaoss/grimoirelab-sirmordred/issues/462*issuecomment-622947475__;Iw!!Mih3wA!T2UB-FLrr2Xswaf411kXKg4_0QV0oNC9NnCA6IDt2E16cEiEUNvFMcPhebu_pg$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKUX6623N7IAENUT4YAQPP3RPQIA3ANCNFSM4MQUC5SQ__;!!Mih3wA!T2UB-FLrr2Xswaf411kXKg4_0QV0oNC9NnCA6IDt2E16cEiEUNvFMcP4WO-8ow$ .

valeriocos commented 4 years ago

You're welcome @Julianbaozi !

The version of mordred is 0.2.25.

Consider to use the version 0.2.39 or greater. The docker-compose at https://github.com/chaoss/grimoirelab/blob/master/docker-compose/docker-compose.yml should be easy to use.

So are you suggesting that I do not use Arthur? Just pass more tokens in the setup file?

Yes! The support for Arthur has been removed (https://github.com/chaoss/grimoirelab-sirmordred/pull/409). An ongoing work aims at integrating Arthur in a different way.

What is the difference here between using mordred and micro-mordred if i’m using for loop?

Micro-mordred allows to execute Mordred tasks individually. If you put micro-mordred tasks in a loop, you get basically something similar to Mordred.

The problem is that it cannot give good results on studies like aoc. For the first project it is fine but the second repository will get zero results. To solve this I have to delete all the index every time i finish one project.

You can set up N mordred/micro-mordred instances just to perform the collection phase. Each instance will have its own setup.cfg and projects.json to target a single repo or a list of them. The data will be stored in separated indexes (with the same alias).

Then, you need another mordred/micro-mordred instance to perform the enrichment phase. In the setup.cfg of this instance, you should use the alias defined above to access the raw indexes. The execution of this instance should start after all collection instances are done.

Sortinghat DB will be the same across all instances.

Can you tell how many repositories are you querying?

And for sortinghat, is the result already merged? When i’m querying the database, it was clear that many people with the same name have different email addresses and uuids. And i saw the identities merge parameter in micro-mordred.

The merge is done during the identities phase (which should be enabled in the enrichment instance). The merging algorithm can be set with the param matching in sortinghat. Details about the different matching algorithms are here: https://gitlab.com/Bitergia/c/FAQ/-/tree/master/how-to-identities#how-many-types-of-matching-do-exist

Julianbaozi commented 4 years ago

I'm doing experiments on just 2 projects. I did git_onion but it seems it gives me git and github people together. when I query only github people, there are repetitions such as 5 different people with the same name in one project. Maybe they are different people in its 10-year history.

On Sat, May 2, 2020 at 6:53 AM valerio notifications@github.com wrote:

You're welcome @Julianbaozi https://urldefense.com/v3/__https://github.com/Julianbaozi__;!!Mih3wA!RJOHZfYi-Yh9rWZFP5QF3kH-QHmVNdjSzFm5X_jaFDuq9IHXQcdldK-WJs_GeA$ !

The version of mordred is 0.2.25.

Consider to use the version 0.2.39 or greater. The docker-compose at https://github.com/chaoss/grimoirelab/blob/master/docker-compose/docker-compose.yml https://urldefense.com/v3/__https://github.com/chaoss/grimoirelab/blob/master/docker-compose/docker-compose.yml__;!!Mih3wA!RJOHZfYi-Yh9rWZFP5QF3kH-QHmVNdjSzFm5X_jaFDuq9IHXQcdldK-usoAg-Q$ should be easy to use.

So are you suggesting that I do not use Arthur? Just pass more tokens in the setup file?

Yes! The support for Arthur has been removed (#409 https://urldefense.com/v3/__https://github.com/chaoss/grimoirelab-sirmordred/pull/409__;!!Mih3wA!RJOHZfYi-Yh9rWZFP5QF3kH-QHmVNdjSzFm5X_jaFDuq9IHXQcdldK8czXd3jQ$). An ongoing work aims at integrating Arthur in a different way.

What is the difference here between using mordred and micro-mordred if i’m using for loop?

Micro-mordred allows to execute Mordred tasks individually. If you put micro-mordred tasks in a loop, you get basically something similar to Mordred.

The problem is that it cannot give good results on studies like aoc. For the first project it is fine but the second repository will get zero results. To solve this I have to delete all the index every time i finish one project.

You can set up N mordred/micro-mordred instances just to perform the collection phase. Each instance will have its own setup.cfg and projects.json to target a single repo or a list of them. The data will be stored in separated indexes (with the same alias).

Then, you need another mordred/micro-mordred instance to perform the enrichment phase. In the setup.cfg of this instance, you should use the alias defined above to access the raw indexes. The execution of this instance should start after all collection instances are done.

Sortinghat DB will be the same across all instances.

Can you tell how many repositories are you querying?

And for sortinghat, is the result already merged? When i’m querying the database, it was clear that many people with the same name have different email addresses and uuids. And i saw the identities merge parameter in micro-mordred.

The merge is done during the identities phase (which should be enabled in the enrichment instance). The merging algorithm can be set with the param matching in sortinghat. Details about the different matching algorithms are here: https://gitlab.com/Bitergia/c/FAQ/-/tree/master/how-to-identities#how-many-types-of-matching-do-exist https://urldefense.com/v3/__https://gitlab.com/Bitergia/c/FAQ/-/tree/master/how-to-identities*how-many-types-of-matching-do-exist__;Iw!!Mih3wA!RJOHZfYi-Yh9rWZFP5QF3kH-QHmVNdjSzFm5X_jaFDuq9IHXQcdldK-4t8_cuw$

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/chaoss/grimoirelab-sirmordred/issues/462*issuecomment-622957274__;Iw!!Mih3wA!RJOHZfYi-Yh9rWZFP5QF3kH-QHmVNdjSzFm5X_jaFDuq9IHXQcdldK-bidLrKQ$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKUX665BP753GZXZERMHYJTRPQQV7ANCNFSM4MQUC5SQ__;!!Mih3wA!RJOHZfYi-Yh9rWZFP5QF3kH-QHmVNdjSzFm5X_jaFDuq9IHXQcdldK_LGxsUBA$ .

Julianbaozi commented 4 years ago

is 0.2.39 only available in docker?

On Sat, May 2, 2020 at 7:57 AM Junliang Yu yujl@ucsd.edu wrote:

I'm doing experiments on just 2 projects. I did git_onion but it seems it gives me git and github people together. when I query only github people, there are repetitions such as 5 different people with the same name in one project. Maybe they are different people in its 10-year history.

On Sat, May 2, 2020 at 6:53 AM valerio notifications@github.com wrote:

You're welcome @Julianbaozi https://urldefense.com/v3/__https://github.com/Julianbaozi__;!!Mih3wA!RJOHZfYi-Yh9rWZFP5QF3kH-QHmVNdjSzFm5X_jaFDuq9IHXQcdldK-WJs_GeA$ !

The version of mordred is 0.2.25.

Consider to use the version 0.2.39 or greater. The docker-compose at https://github.com/chaoss/grimoirelab/blob/master/docker-compose/docker-compose.yml https://urldefense.com/v3/__https://github.com/chaoss/grimoirelab/blob/master/docker-compose/docker-compose.yml__;!!Mih3wA!RJOHZfYi-Yh9rWZFP5QF3kH-QHmVNdjSzFm5X_jaFDuq9IHXQcdldK-usoAg-Q$ should be easy to use.

So are you suggesting that I do not use Arthur? Just pass more tokens in the setup file?

Yes! The support for Arthur has been removed (#409 https://urldefense.com/v3/__https://github.com/chaoss/grimoirelab-sirmordred/pull/409__;!!Mih3wA!RJOHZfYi-Yh9rWZFP5QF3kH-QHmVNdjSzFm5X_jaFDuq9IHXQcdldK8czXd3jQ$). An ongoing work aims at integrating Arthur in a different way.

What is the difference here between using mordred and micro-mordred if i’m using for loop?

Micro-mordred allows to execute Mordred tasks individually. If you put micro-mordred tasks in a loop, you get basically something similar to Mordred.

The problem is that it cannot give good results on studies like aoc. For the first project it is fine but the second repository will get zero results. To solve this I have to delete all the index every time i finish one project.

You can set up N mordred/micro-mordred instances just to perform the collection phase. Each instance will have its own setup.cfg and projects.json to target a single repo or a list of them. The data will be stored in separated indexes (with the same alias).

Then, you need another mordred/micro-mordred instance to perform the enrichment phase. In the setup.cfg of this instance, you should use the alias defined above to access the raw indexes. The execution of this instance should start after all collection instances are done.

Sortinghat DB will be the same across all instances.

Can you tell how many repositories are you querying?

And for sortinghat, is the result already merged? When i’m querying the database, it was clear that many people with the same name have different email addresses and uuids. And i saw the identities merge parameter in micro-mordred.

The merge is done during the identities phase (which should be enabled in the enrichment instance). The merging algorithm can be set with the param matching in sortinghat. Details about the different matching algorithms are here: https://gitlab.com/Bitergia/c/FAQ/-/tree/master/how-to-identities#how-many-types-of-matching-do-exist https://urldefense.com/v3/__https://gitlab.com/Bitergia/c/FAQ/-/tree/master/how-to-identities*how-many-types-of-matching-do-exist__;Iw!!Mih3wA!RJOHZfYi-Yh9rWZFP5QF3kH-QHmVNdjSzFm5X_jaFDuq9IHXQcdldK-4t8_cuw$

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/chaoss/grimoirelab-sirmordred/issues/462*issuecomment-622957274__;Iw!!Mih3wA!RJOHZfYi-Yh9rWZFP5QF3kH-QHmVNdjSzFm5X_jaFDuq9IHXQcdldK-bidLrKQ$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKUX665BP753GZXZERMHYJTRPQQV7ANCNFSM4MQUC5SQ__;!!Mih3wA!RJOHZfYi-Yh9rWZFP5QF3kH-QHmVNdjSzFm5X_jaFDuq9IHXQcdldK_LGxsUBA$ .

valeriocos commented 4 years ago

That's strange because with your configuration onion is calculated only for git data. Can you share a pointer to/example of your data, so I can have a deeper look at the problem? Thanks

is 0.2.39 only available in docker?

You can use the source code or the docker-compose shared above. 0.2.39 isn't available on Pypi (https://pypi.org/project/sirmordred/). Does it answer your question?

junliangyu96 commented 4 years ago

Screenshot from 2020-05-03 06-13-21

Hi tried source code method. I installed elastic search and mariadb on my computer.

valeriocos commented 4 years ago

Hi @junliangyu96 , please check if the content at https://github.com/chaoss/grimoirelab-sirmordred/blob/master/Getting-Started.md#ssl-error- solves your problem

Julianbaozi commented 4 years ago

Thank you! It solves the problem.

When passing more than one tokens, are they working at the same time or only one of them will be working if it's not restricted. I'm thinking about make it distributed.

Thanks!

On Sun, May 3, 2020 at 8:16 AM valerio notifications@github.com wrote:

Hi @junliangyu96 https://urldefense.com/v3/__https://github.com/junliangyu96__;!!Mih3wA!V5h2Ahz3Stssh8dBhjw40bs945eU7HeHCvO-oosgLYBfLOa770Hl8TwF8c2y7w$ , please check if the content at https://github.com/chaoss/grimoirelab-sirmordred/blob/master/Getting-Started.md#ssl-error- https://urldefense.com/v3/__https://github.com/chaoss/grimoirelab-sirmordred/blob/master/Getting-Started.md*ssl-error-__;Iw!!Mih3wA!V5h2Ahz3Stssh8dBhjw40bs945eU7HeHCvO-oosgLYBfLOa770Hl8TxUUv_FsQ$ solves your problem

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/chaoss/grimoirelab-sirmordred/issues/462*issuecomment-623125478__;Iw!!Mih3wA!V5h2Ahz3Stssh8dBhjw40bs945eU7HeHCvO-oosgLYBfLOa770Hl8TxehKGVNw$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKUX66Y4ONQC7VTZBGFQ5ODRPWDFFANCNFSM4MQUC5SQ__;!!Mih3wA!V5h2Ahz3Stssh8dBhjw40bs945eU7HeHCvO-oosgLYBfLOa770Hl8Tw9-mNcVg$ .

junliangyu96 commented 4 years ago

Screenshot from 2020-05-03 12-35-13

can't run micro. Cannot import.

and the version is 0.2.31 if I git all the package at once.

junliangyu96 commented 4 years ago

In fact I can run 'debug' but not 'run'

valeriocos commented 4 years ago

https://github.com/chaoss/grimoirelab-sirmordred/issues/462#issuecomment-623151419

You're welcome @junliangyu96

When passing more than one tokens, are they working at the same time or only one of them will be working if it's not restricted. I'm thinking about make it distributed.

There will be just one token working. When the current token is close to hit the max number of petitions allowed, the next token in the list will be used.

valeriocos commented 4 years ago

https://github.com/chaoss/grimoirelab-sirmordred/issues/462#issuecomment-623168664

It seems you have a dependency problem, the urllib3 version should be urllib3==1.24.3 (ref https://github.com/chaoss/grimoirelab-elk/blob/master/requirements.txt#L4)