chaoss / grimoirelab-elk

GNU General Public License v3.0
59 stars 120 forks source link

[Gitter] Misclasifying Pull Requests and Issues #1028

Open k----n opened 2 years ago

k----n commented 2 years ago

Here is some data that is gitter enriched for a pull request:

{
        "_index" : "gitter_enriched_raw",
        "_type" : "items",
        "_id" : "a9a4a861b3011c2bafdef977c72b205419449b4d",
        "_score" : 5.3138795,
        "_source" : {
          "metadata__updated_on" : "2016-04-30T00:34:26.399000+00:00",
          "metadata__timestamp" : "2022-01-30T03:29:44.263053+00:00",
          "offset" : null,
          "origin" : "https://gitter.im/shuup/shuup",
          "tag" : "https://gitter.im/shuup/shuup",
          "uuid" : "a9a4a861b3011c2bafdef977c72b205419449b4d",
          "unread" : 0,
          "text_analyzed" : "Looks like those issues should be fixed with this bugfix: https://github.com/shoopio/shoop/pull/441",
          "readBy" : 15,
          "issues" : [
            {
              "repo" : "shoopio/shoop",
              "number" : "441"
            }
          ],
          "id" : "5723fd92e10a59c061074eed",
          "url_hostname" : [ ],
          "tz" : 0,
          "fromUser_id" : "83fe61561124b9a496e97c89c2f48d3ff8319eac",
          "fromUser_uuid" : "83fe61561124b9a496e97c89c2f48d3ff8319eac",
          "fromUser_name" : "Shawn Her Many Horses",
          "fromUser_user_name" : "",
          "fromUser_domain" : null,
          "fromUser_gender" : "Unknown",
          "fromUser_gender_acc" : 0,
          "fromUser_org_name" : "Unknown",
          "fromUser_bot" : false,
          "fromUser_multi_org_names" : [
            "Unknown"
          ],
          "author_id" : "83fe61561124b9a496e97c89c2f48d3ff8319eac",
          "author_uuid" : "83fe61561124b9a496e97c89c2f48d3ff8319eac",
          "author_name" : "Shawn Her Many Horses",
          "author_user_name" : "",
          "author_domain" : null,
          "author_gender" : "Unknown",
          "author_gender_acc" : 0,
          "author_org_name" : "Unknown",
          "author_bot" : false,
          "author_multi_org_names" : [
            "Unknown"
          ],
          "project" : "shuup/shuup",
          "project_1" : "shuup/shuup",
          "grimoire_creation_date" : "2016-04-30T00:34:26.399000+00:00",
          "is_gitter_message" : 1,
          "repository_labels" : [ ],
          "metadata__filter_raw" : null,
          "metadata__gelk_version" : "0.99.0",
          "metadata__gelk_backend_name" : "GitterEnrich",
          "metadata__enriched_on" : "2022-01-30T04:54:27.841150+00:00"
        }
      },

There should be is_pull according to here: https://github.com/chaoss/grimoirelab-elk/blob/b626512cd2768287bd1e52e98c7773073be21fc1/grimoire_elk/enriched/gitter.py#L166

Maybe the regex isn't working? https://github.com/chaoss/grimoirelab-elk/blob/b626512cd2768287bd1e52e98c7773073be21fc1/grimoire_elk/enriched/gitter.py#L64


Here's a case with an issue:

{
        "_index" : "gitter_enriched_raw",
        "_type" : "items",
        "_id" : "963132bc57c2bf58a906aca2dc1f91fdeb65f76a",
        "_score" : 5.2737937,
        "_source" : {
          "metadata__updated_on" : "2017-01-27T08:45:38.335000+00:00",
          "metadata__timestamp" : "2022-01-30T03:29:41.738953+00:00",
          "offset" : null,
          "origin" : "https://gitter.im/shuup/shuup",
          "tag" : "https://gitter.im/shuup/shuup",
          "uuid" : "963132bc57c2bf58a906aca2dc1f91fdeb65f76a",
          "unread" : 0,
          "text_analyzed" : "https://github.com/shuup/shuup/issues/361 -> i tried this",
          "readBy" : 17,
          "issues" : [
            {
              "repo" : "shuup/shuup",
              "number" : "361"
            }
          ],
          "id" : "588b08b25309d6b3587415c3",
          "url_hostname" : [ ],
          "tz" : 8,
          "fromUser_id" : "dfd2be02ddb641b7634d4bf5b9aabf6527b6ecf4",
          "fromUser_uuid" : "dfd2be02ddb641b7634d4bf5b9aabf6527b6ecf4",
          "fromUser_name" : "aoy12",
          "fromUser_user_name" : "",
          "fromUser_domain" : null,
          "fromUser_gender" : "Unknown",
          "fromUser_gender_acc" : 0,
          "fromUser_org_name" : "Unknown",
          "fromUser_bot" : false,
          "fromUser_multi_org_names" : [
            "Unknown"
          ],
          "author_id" : "dfd2be02ddb641b7634d4bf5b9aabf6527b6ecf4",
          "author_uuid" : "dfd2be02ddb641b7634d4bf5b9aabf6527b6ecf4",
          "author_name" : "aoy12",
          "author_user_name" : "",
          "author_domain" : null,
          "author_gender" : "Unknown",
          "author_gender_acc" : 0,
          "author_org_name" : "Unknown",
          "author_bot" : false,
          "author_multi_org_names" : [
            "Unknown"
          ],
          "project" : "shuup/shuup",
          "project_1" : "shuup/shuup",
          "grimoire_creation_date" : "2017-01-27T08:45:38.335000+00:00",
          "is_gitter_message" : 1,
          "repository_labels" : [ ],
          "metadata__filter_raw" : null,
          "metadata__gelk_version" : "0.99.0",
          "metadata__gelk_backend_name" : "GitterEnrich",
          "metadata__enriched_on" : "2022-01-30T04:54:15.833892+00:00"
        }
      }

There should be an is_issue key according to: https://github.com/chaoss/grimoirelab-elk/blob/b626512cd2768287bd1e52e98c7773073be21fc1/grimoire_elk/enriched/gitter.py#L163

It's probably a regex issue again?

k----n commented 2 years ago

Sometimes the pull request or issue is referred to in a span tag e.g.:

"data" : {
            "id" : "5723fd92e10a59c061074eed",
            "text" : "Looks like those issues should be fixed with this bugfix: https://github.com/shoopio/shoop/pull/441",
            "html" : """Looks like those issues should be fixed with this bugfix: <span data-link-type="issue" data-issue="441" data-issue-repo="shoopio/shoop" class="issue">shoopio/shoop#441</span>""",
            "sent" : "2016-04-30T00:34:26.399Z",
            "unread" : false,
            "readBy" : 15,
            "urls" : [ ],
            "mentions" : [ ],
            "issues" : [
              {
                "repo" : "shoopio/shoop",
                "number" : "441"
              }
            ],

Even when it's a pull request, it's linked as an "issue" in the span tag.

So github will need to be queried to classify as either pull request or issue