ai-se / Resume_Job_Matching

0 stars 0 forks source link

Open questions #12

Open azhe825 opened 4 years ago

azhe825 commented 4 years ago

Getting more data

Website Boasted Jobs Collectable Jobs With Money Collectable Jobs Without Money Boasted Resumes Collectable Resumes With Money Collectable Resumes Without Money
LinkedIn 9355 1000 1000 348196 1000 29
Jobboard - - - - - -
MightyRecruiter - - - - -
LiveCareer 0 0 0 241 200 200
PostJobFree 10012 500 500 12222 500 500
Jobvertise 1000 1000 1000 1000 1000 3
Craigslist

Linkedin Machinist Query PostJobFree Machinist Query Jobvertise Machinist Query LiveCareer Machinist Query

Getting better ground truth

Job: Resume:

Categories Count in Resumes and Jobs

Job: Resume:

Use better encoding

Job Free, 6 categories

targeting jobs {'tfidf': 0.35 (0.016), 'lda': 0.27 (0.024), 'doc2vec': -0.00} targeting resumes {'tfidf': 0.33 (0.009), 'lda': 0.25 (0.021), 'doc2vec': 0.02}

Need literature review

How to extract skills

Need literature review

De-identifier

timm commented 4 years ago

Lesson1: we will need a significant budget for phase2 to enabled paid access to these sites


Lesson2: data collection is a problem. So that effects who we can write this work for. e.g. if we were a tool supporting some in house HR department, then we could access the data they could access



lesson3: the “getting ground truth charts” tell us there is a synonym discovery issue. We can solve this, with better maths and algorithms. And all those ways will need humans to assess results. So we we will need a phase2 budget for mechanical turk to handle the data labelling probelm



Lesson4: matching algorithms will also need exploring. So we we will need a phase2 budget for mechanical turk to handle the data labelling for such analysis
Lesson5: we need better tools for entity extraction for tasks, skills, year