Open Hamedloghmani opened 7 months ago
Hi @Hamedloghmani,
@edwinpaul121 and I started working on the gender mappings for uspt, and we were able to generate the gender.csv file. Here is the code
mappings = {}
with open("data/preprocessed/uspt/patent.tsv.filtered.mt75.ts3/teams.pkl", "rb") as f:
with open("data/preprocessed/uspt/patent.tsv.filtered.mt75.ts3/indexes.pkl", "rb") as f_2:
teams_pkl = pkl.load(f)
indexes_pkl = pkl.load(f_2)
# print(teams_pkl[0])
c2i = indexes_pkl['c2i']
for patent in teams_pkl:
for member in patent.members:
ind = c2i[member.id + "_" + member.name]
if(ind not in mappings):
mappings[ind] = member.gender
df = pd.DataFrame.from_dict(mappings, orient="index", columns=["gender"])
df.to_csv("data/preprocessed/uspt/patent.tsv.filtered.mt75.ts3/gender.csv")
However, we have a few concerns about our code:
Thank you so much @gabrielrueda and @edwinpaul121 We will discuss issue 1 on Friday. 2) Please leave them empty, I'll handle it in my own code. 3) Thanks a lot, please let me know.
Hi @Hamedloghmani, I just wanted to let you know that I checked some of the gender values with those in the inventor.tsv file in the USPT dataset and can confirm that the gender values were valid. Also, I'll upload the resulting gender.csv file in the Adila teams channel -> USPT Labelling Files
Hi @gabrielrueda . Thanks a lot for the update and confirmation.
Hi @edwinpaul121 and @gabrielrueda Please log the process for extracting gender labels for uspt dataset in this issue page and let me know if you have any questions. Thank you.