Intermediate Text Analysis: Annotation

salekinsirajus commented 3 weeks ago

Problem Statement The goal of this task is to come up with an algorithm and implementation to analyze the entirety of the content. Note that the deliverable is not a perfectly annotated resume, as that is an almost impossible task. Rather we want to focus on annotating 80% or more of the content with reasonable accuracy.

Files Changed Implement this in the backend - similar to how issue #4 is done

Approaches There are multiple ways to attack this problem; please consider the pros and cons of all

Search and Find: you look for specific things in the content, and run multiple passes. Use the schema as a catalog for things you are searching
Identify as You Go: with this approach, for every word (or set of words) you encounter, you will attempt to classify it based on your catalog. Similar to the other approach, you can run multiple passes.
Combination: the strategy could incorporate both of the aforementioned approaches.

Note This issue needs a substantial research/whiteboarding session before implementation.

MujtabaMuhammad commented 2 weeks ago

Did some brain storming:

For education section perhaps we can look for the word "University".
For work experience, we can either create an array of "action words" and search around them, as people tend to use them to describe their responsibilities at work place; or we create an array of job titles using Bureau of Labor Statistics database (this won't be exhaustive but we can probably capture at least 80% of roles out there).
How many words around the key words we search for do we include ? We talked about graphing - need to learn/discuss more on that end

MujtabaMuhammad commented 2 weeks ago

Next Tasks:

1)Learn about structure if PDF file 2) Exporting the PDF onto the back end as a whole file. 3) Using Python libraries to parse the content. 4) Write down an algorithm to do the parsing. This will be modified to increase accuracy and parse content more universally for the resumes out there.

TheOpenResumeProject / webapp

Intermediate Text Analysis: Annotation #8

Next Tasks: