fani-lab / Adila

Fairness-Aware Team Formation
3 stars 2 forks source link

uspt gender labels and experiments #87

Open Hamedloghmani opened 7 months ago

Hamedloghmani commented 7 months ago

Hi @edwinpaul121 and @gabrielrueda Please log the process for extracting gender labels for uspt dataset in this issue page and let me know if you have any questions. Thank you.

gabrielrueda commented 7 months ago

Hi @Hamedloghmani,

@edwinpaul121 and I started working on the gender mappings for uspt, and we were able to generate the gender.csv file. Here is the code

mappings = {}

with open("data/preprocessed/uspt/patent.tsv.filtered.mt75.ts3/teams.pkl", "rb") as f:
    with open("data/preprocessed/uspt/patent.tsv.filtered.mt75.ts3/indexes.pkl", "rb") as f_2:
        teams_pkl = pkl.load(f)
        indexes_pkl = pkl.load(f_2)

        # print(teams_pkl[0])
        c2i = indexes_pkl['c2i']

        for patent in teams_pkl:
            for member in patent.members:
                ind = c2i[member.id + "_" + member.name]
                if(ind not in mappings):
                    mappings[ind] = member.gender

        df = pd.DataFrame.from_dict(mappings, orient="index", columns=["gender"])
        df.to_csv("data/preprocessed/uspt/patent.tsv.filtered.mt75.ts3/gender.csv")

However, we have a few concerns about our code:

  1. We have to run our code in the OpeNTF repo since it needs access to the patent.py and inventor.py files in the cmn folder.
  2. Some of the gender results were null. Should I assume these to be True or just leave them as False.
  3. Also, a large number of the results are True (male). I will check the results to confirm if this is intentional.
Hamedloghmani commented 7 months ago

Thank you so much @gabrielrueda and @edwinpaul121 We will discuss issue 1 on Friday. 2) Please leave them empty, I'll handle it in my own code. 3) Thanks a lot, please let me know.

gabrielrueda commented 7 months ago

Hi @Hamedloghmani, I just wanted to let you know that I checked some of the gender values with those in the inventor.tsv file in the USPT dataset and can confirm that the gender values were valid. Also, I'll upload the resulting gender.csv file in the Adila teams channel -> USPT Labelling Files

Hamedloghmani commented 7 months ago

Hi @gabrielrueda . Thanks a lot for the update and confirmation.