Parser that extracts information from any resume and converts into a structured .json format to be used by internal systems. The parser uses a rule-based approach that focuses on semantic rather than syntactic parsing. The parser can handle document types in .pdf, .txt, .doc and .docx (Microsoft word). In its current form, this application is a console based application.
Open powershell in windows (run->powershell)
Run syntax:
> java -cp '.\bin\*;..\GATEFiles\lib\*;..\GATEFILES\bin\gate.jar;.\lib\*' code4goal.antony.resumeparser.ResumeParserProgram <input_file> [output_file]
Test:
> java -cp '.\bin\*;..\GATEFiles\lib\*;..\GATEFILES\bin\gate.jar;.\lib\*' code4goal.antony.resumeparser.ResumeParserProgram .\UnitTests\AntonyDeepakThomas.pdf antony_thomas.json
Open terminal
git clone https://github.com/antonydeepak/ResumeParser.git
cd ResumeParser/ResumeTransducer
export GATE_HOME="..\GATEFiles"
java -cp 'bin/*:../GATEFiles/lib/*:../GATEFiles/bin/gate.jar:lib/*' code4goal.antony.resumeparser.ResumeParserProgram <input_file> [output_file]
{
"title":"",
"gender":"",
"name":{
"first":"Antony",
"middle":"Deepak",
"last":"Thomas"
},
"email":[
],
"address":[
],
"phone":[
],
"url":[
],
"work_experience":[
{
"date_start":"",
"jobtitle":"",
"organization":"",
"date_end":"",
"text":""
},
{
"<section_title>":""
}
],
"skills":[
{
"<section_title_from_resume>":"text"
}
],
"education_and_training":[
{
"<section_title_from_resume>":"text"
}
],
"accomplishments":[
{
"<section_title_from_resume>":"text"
}
],
"awards":[
{
"<section_title_from_resume>":"text"
}
],
"credibility":[
{
"<section_title_from_resume>":"text"
}
],
"extracurricular":[
{
"<section_title_from_resume>":"text"
}
],
"misc":[
{
"<section_title_from_resume>":"text"
}
]
}
I tried my best to not blow in the face of user, but these are some gotchas:
\ResumeParser
-\ANNIEGazetterFiles
Contains all the compiled lists for common resume section titles
-\GATEFiles
Contains all the GATE libraries needed for NL processing
-\JAPEGrammars
Contains all the JAPE grammars for resume parsing.
-\ResumeTransducer<br/>
Console application written in JAVA
Parse uses the Engligh grammar engine provided by GATE through its ANNIE framework. The output is then transduced using the grammar rules and lists specifically written for resume parsing. The JAPE grammar defines a generic set of rules that complies with popular ways of resume writing. It takes Proper nouns from lists and applies them to rules to identify entities. Explore the source code and read about GATE for more details. Also, feel free to pose questions.