apache / lucene-jira-archive

Jira archive for Apache Lucene
https://lucene.apache.org/
2 stars 6 forks source link

Add ohrphaned jira usernames #103

Closed mocobeta closed 2 years ago

mocobeta commented 2 years ago

96

mappings-data/ohphan_jira_ids.txt lists the "orphaned" Jira usernames that are obsolete usernames (i.e. unknown Jira users) appearing in issue descriptions or comments ([~username]). I also added several mappings to the "verified" account mapping file; I don't find "new" accounts, but they will work as aliases.

These orphaned usernames are detected by this script. (I didn't commit this scratchy code).

from operator import itemgetter
from pathlib import Path
import json
import re
import itertools
from collections import defaultdict

from common import JIRA_DUMP_DIRNAME, MAPPINGS_DATA_DIRNAME, JIRA_USERS_FILENAME, read_jira_users_map
from jira_util import REGEX_MENION_TILDE, extract_description, extract_comments

dump_dir = Path(__file__).resolve().parent.parent.joinpath(JIRA_DUMP_DIRNAME)
mappings_dir = Path(__file__).resolve().parent.parent.joinpath(MAPPINGS_DATA_DIRNAME)
jira_users_file = mappings_dir.joinpath(JIRA_USERS_FILENAME)
jira_users = read_jira_users_map(jira_users_file) if jira_users_file.exists() else {}

def extract_tilde_mentions(text):
    mentions = re.findall(REGEX_MENION_TILDE, text)
    mentions = set(filter(lambda x: x != '', itertools.chain.from_iterable(mentions)))
    mentions = [x[2:-1] for x in mentions]
    return mentions

orphan_ids = defaultdict(int)
for dump_file in dump_dir.glob("LUCENE-*.json"):
    mentions = set([])
    with open(dump_file) as fp:
        o = json.load(fp)
        description = extract_description(o)
        mentions.update(extract_tilde_mentions(description))
        comments = extract_comments(o)
        for (_, _, comment, _, _, _) in comments:
            mentions.update(extract_tilde_mentions(comment))
    for m in mentions:
        if m not in jira_users:
            orphan_ids[m] += 1

orphan_ids = sorted(orphan_ids.items(), key=itemgetter(1), reverse=True)
for id, count in orphan_ids:
    print(f'{id}\t{count}')
mikemccand commented 2 years ago

I didn't commit this scratchy code

Oh no! You should commit scratchy code! Progress not perfection. It's an awesome start, and future people struggling with Jira -> GitHub migration, might want to handle such orphan'd cases too.

mikemccand commented 2 years ago

I didn't commit this scratchy code

Oh no! You should commit scratchy code! Progress not perfection. It's an awesome start, and future people struggling with Jira -> GitHub migration, might want to handle such orphan'd cases too.

OK, I'm trying to smooth a bit of its scratchiness and I'll commit! It makes it easier for me to iterate on this orphan'd usernames.