OmkarPathak / pyresparser

A simple resume parser used for extracting information from resumes
GNU General Public License v3.0
774 stars 395 forks source link

os.path.splitext(self.__resume)[1].split #20

Closed MiCodes2 closed 3 years ago

MiCodes2 commented 4 years ago

Hi, I am getting following errors. Please check if I am missing something.

Error:

  File "/home/****/.local/lib/python3.6/site-packages/pyresparser/resume_parser.py", line 40, in __init__
    ext = os.path.splitext(self.__resume)[1].split('.')[1]

IndexError: list index out of range

Thanks.

OmkarPathak commented 4 years ago

@mithileshk87 can you please provide more information. Like what are you trying to run?

MiCodes2 commented 4 years ago

Hi @OmkarPathak , I am running this package for information retrieval as mentioned here on 1000 test resumes which have combination of doc, pdf and docx files.

I ran following code for that:

import nltk
nltk.download('stopwords')
import spacy
spacy.load("en_core_web_sm")
from pyresparser import ResumeParser
data = ResumeParser('/home/mithileshkumar/Documents/Personal/Resume_Extraction/ConsultantResumes').get_extracted_data() 
OmkarPathak commented 4 years ago

@mithileshk87 Your code will only work for a single file and not for a directory. You have to change it to something like:

import os
import nltk
import spacy
from pyresparser import ResumeParser

nltk.download('stopwords')
spacy.load("en_core_web_sm")

for root, _, filenames in os.walk(directory):
    for filename in filenames:
        file = os.path.join(root, filename)
        data = ResumeParser(file).get_extracted_data() 
OmkarPathak commented 3 years ago

Closing because of no response