TheOpenResumeProject / webapp

A tool to get your resume converted to conform to the OpenResume standard.
MIT License
0 stars 0 forks source link

Intermediate Text Analysis: Annotation #8

Open salekinsirajus opened 2 months ago

salekinsirajus commented 2 months ago

Problem Statement The goal of this task is to come up with an algorithm and implementation to analyze the entirety of the content. Note that the deliverable is not a perfectly annotated resume, as that is an almost impossible task. Rather we want to focus on annotating 80% or more of the content with reasonable accuracy.

Files Changed Implement this in the backend - similar to how issue #4 is done

Approaches There are multiple ways to attack this problem; please consider the pros and cons of all

Note This issue needs a substantial research/whiteboarding session before implementation.

MujtabaMuhammad commented 2 months ago

Did some brain storming:

  1. For education section perhaps we can look for the word "University".
  2. For work experience, we can either create an array of "action words" and search around them, as people tend to use them to describe their responsibilities at work place; or we create an array of job titles using Bureau of Labor Statistics database (this won't be exhaustive but we can probably capture at least 80% of roles out there).
  3. How many words around the key words we search for do we include ? We talked about graphing - need to learn/discuss more on that end
MujtabaMuhammad commented 2 months ago

Next Tasks:

1)Learn about structure if PDF file 2) Exporting the PDF onto the back end as a whole file. 3) Using Python libraries to parse the content. 4) Write down an algorithm to do the parsing. This will be modified to increase accuracy and parse content more universally for the resumes out there.