Open azhe825 opened 4 years ago
Lesson1: we will need a significant budget for phase2 to enabled paid access to these sites
Lesson2: data collection is a problem. So that effects who we can write this work for. e.g. if we were a tool supporting some in house HR department, then we could access the data they could access
lesson3: the “getting ground truth charts” tell us there is a synonym discovery issue. We can solve this, with better maths and algorithms. And all those ways will need humans to assess results. So we we will need a phase2 budget for mechanical turk to handle the data labelling probelm
Lesson4: matching algorithms will also need exploring. So we we will need a phase2 budget for mechanical turk to handle the data labelling for such analysis Lesson5: we need better tools for entity extraction for tasks, skills, year
Getting more data
Getting better ground truth
Job: Resume:
Categories Count in Resumes and Jobs
Job: Resume:
Use better encoding
Job Free, 6 categories
targeting jobs {'tfidf': 0.35 (0.016), 'lda': 0.27 (0.024), 'doc2vec': -0.00} targeting resumes {'tfidf': 0.33 (0.009), 'lda': 0.25 (0.021), 'doc2vec': 0.02}
Need literature review
How to extract skills
Need literature review
De-identifier