KunSu / Personalized-Job-Recommendation-System

1 stars 0 forks source link

Add TFIDF function and update Data Cleaning #9

Closed SungYinYang closed 4 years ago

SungYinYang commented 4 years ago
  1. Add TFIDF.ipynb which will calculate the most important words base on TF/IDF
  2. Update Data Clearing to able to clean "//n"
KunSu commented 4 years ago

Some Update on Data Cleaning:

Add removing stopwords for # Second Method: Tfidfvectorizer

So for output CSV: 'tfidf1_jd.csv' with TFIDF only 'tfidf2_jd.csv' with TFIDF + removing stopwords @SungYinYang

Some changes based on your PR:

vec=[]
for i in range (len(jobs)):
    text = "{} {} {} {} {}".format(jobs.loc[i, 'Title'],
                                   jobs.loc[i, 'Description'],
                                   jobs.loc[i, 'Requirements'],
                                   jobs.loc[i, 'State'],
                                   jobs.loc[i, 'City'])
#     vec.append(str(jobs.loc[i , 'Title']) + " " + str(jobs.loc[i , 'Description'])
#                +" "+str(jobs.loc[i , 'Requirements'])+" "+str(jobs.loc[i , 'State'])
#                +" "+str(jobs.loc[i , 'City']))
    vec.append(text)