Closed SungYinYang closed 4 years ago
Some Update on Data Cleaning:
Add removing stopwords for # Second Method: Tfidfvectorizer
So for output CSV: 'tfidf1_jd.csv' with TFIDF only 'tfidf2_jd.csv' with TFIDF + removing stopwords @SungYinYang
Some changes based on your PR:
vec=[]
for i in range (len(jobs)):
text = "{} {} {} {} {}".format(jobs.loc[i, 'Title'],
jobs.loc[i, 'Description'],
jobs.loc[i, 'Requirements'],
jobs.loc[i, 'State'],
jobs.loc[i, 'City'])
# vec.append(str(jobs.loc[i , 'Title']) + " " + str(jobs.loc[i , 'Description'])
# +" "+str(jobs.loc[i , 'Requirements'])+" "+str(jobs.loc[i , 'State'])
# +" "+str(jobs.loc[i , 'City']))
vec.append(text)