chen0040 / keras-english-resume-parser-and-analyzer

keras project that parses and analyze english resumes
MIT License
272 stars 144 forks source link

Access text content based on Labels #9

Open amithadiraju1694 opened 5 years ago

amithadiraju1694 commented 5 years ago

Hey @chen0040 ,

First of all thank you for such a clean and crisp tool ! I highly appreciate your time. I've been playing with tool since few days now, I got the hang of the flow and the code. I was wondering whether there's any way currently to access specific text content by their labels, say with label 'Education' or label ' Experience '. Currently I was only able to retrieve raw content from the resume, the result is pretty good, but just wondering if that could be extended further. I understand that adding such extensions is not too straight forward. If you don't a;ready have such a service I'm willing to contribute. Let me know what you think.

marcelogrsp commented 5 years ago

I need something like that as well. I am going to try to code it. One question: Is personal detail considered as meta or content?

amithadiraju1694 commented 5 years ago

@marcelogrsp

I think, it depends on use case. In my case, I'm more concerned about Work history and education, so personal detail would be meta in my case.

marcelogrsp commented 5 years ago

Me too! I think your needs are exactly the same of mine! Let’s keep in touch! Tomorrow I will upload a sample here!

Sent with GitHawk

Deepakpa commented 5 years ago

Hi Guys,

Have you done that? Because I am also looking for the same.

parth2050 commented 4 years ago

@amit8121
Have you run entire code successfully, I'm getting error of, print(self.predict(sentence)) File "C:/Users/PARTH/PycharmProjects/Project3/Deep-learning-based-resume-parser-and-analyzer-master/demo/lstm.py", line 127, in predict wid = [self.word2idx[token] if token in self.word2idx else 1 for token in tokens] File "C:/Users/PARTH/PycharmProjects/Project3/Deep-learning-based-resume-parser-and-analyzer-master/demo/lstm.py", line 127, in wid = [self.word2idx[token] if token in self.word2idx else 1 for token in tokens] TypeError: argument of type 'NoneType' is not iterable

While I'm running script of dl_based_parser_train.py, it's throwing error:

Traceback (most recent call last): File "C:/Users/PARTH/PycharmProjects/Project3/Deep-learning-based-resume-parser-and-analyzer-master/demo/dl_based_parser_train.py", line 32, in main() File "C:/Users/PARTH/PycharmProjects/Project3/Deep-learning-based-resume-parser-and-analyzer-master/demo/dl_based_parser_train.py", line 26, in main test_size=0.3 File "C:\Users\PARTH\PycharmProjects\Project3\Deep-learning-based-resume-parser-and-analyzer-master\demo\dl_based_parser.py", line 48, in fit random_state=random_state) File "C:\Users\PARTH\PycharmProjects\Project3\Deep-learning-based-resume-parser-and-analyzer-master\demo\dl_based_parser.py", line 72, in fit_line_label random_state=random_state) File "C:\Users\PARTH\PycharmProjects\Project3\Deep-learning-based-resume-parser-and-analyzer-master\demo\lstm.py", line 363, in fit x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=test_size) File "C:\Users\PARTH\PycharmProjects\Project3\venv\lib\site-packages\sklearn\model_selection_split.py", line 2120, in train_test_split default_test_size=0.25) File "C:\Users\PARTH\PycharmProjects\Project3\venv\lib\site-packages\sklearn\model_selection_split.py", line 1805, in _validate_shuffle_split train_size) ValueError: With n_samples=0, test_size=0.3 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters.

When I run dl_based_parser_predict.py script, it's showing up error:

2019-12-25 14:50:19.123899: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 Traceback (most recent call last): File "C:/Users/PARTH/PycharmProjects/Project3/Deep-learning-based-resume-parser-and-analyzer-master/demo/dl_based_parser_predict.py", line 39, in main() File "C:/Users/PARTH/PycharmProjects/Project3/Deep-learning-based-resume-parser-and-analyzer-master/demo/dl_based_parser_predict.py", line 31, in main collected = read_pdf_and_docx(data_dir_path, command_logging=True, callback=lambda index, file_path, file_content: { File "C:\Users\PARTH\PycharmProjects\Project3\Deep-learning-based-resume-parser-and-analyzer-master\demo\io_utils.py", line 23, in read_pdf_and_docx callback(len(collected), file_path, txt) File "C:/Users/PARTH/PycharmProjects/Project3/Deep-learning-based-resume-parser-and-analyzer-master/demo/dl_based_parser_predict.py", line 32, in parse_resume(file_path, file_content) File "C:/Users/PARTH/PycharmProjects/Project3/Deep-learning-based-resume-parser-and-analyzer-master/demo/dl_based_parser_predict.py", line 22, in parse_resume parser.load_model(current_dir + '/models') File "C:\Users\PARTH\PycharmProjects\Project3\Deep-learning-based-resume-parser-and-analyzer-master\demo\dl_based_parser.py", line 39, in load_model self.line_label_classifier.load_model(model_dir_path=os.path.join(model_dir_path, 'line_label').replace('\','/')) File "C:\Users\PARTH\PycharmProjects\Project3\Deep-learning-based-resume-parser-and-analyzer-master\demo\lstm.py", line 299, in load_model self.model.load_weights(self.get_weight_file_path(model_dir_path)) File "C:\Users\PARTH\PycharmProjects\Project3\venv\lib\site-packages\keras\engine\saving.py", line 492, in load_wrapper return load_function(*args, **kwargs) File "C:\Users\PARTH\PycharmProjects\Project3\venv\lib\site-packages\keras\engine\network.py", line 1230, in load_weights f, self.layers, reshape=reshape) File "C:\Users\PARTH\PycharmProjects\Project3\venv\lib\site-packages\keras\engine\saving.py", line 1209, in load_weights_from_hdf5_group str(len(filtered_layers)) + ' layers.') ValueError: You are trying to load a weight file containing 0 layers into a model with 3 layers.

@amit8121 Can you help me to figure out problem ?

shubhambharadwaj commented 4 years ago

Hi all,

I was able to get the code up and running in MacOS Catalina. I got to the point where I am generating text files for each resume that I am parsing for getting the labelled data. I wanted to know if there is any streamline way of making this labelled data beforehand from resumes?