enhanced-resume
Project for p-ai: creating a resume generator based on using Named Entity Recognition on web-scraped job listings to prioritize keywords on a user's resume.
Collaborators:
- Kevin Ayala
- Evan Von Oehsen
- Aidan Wu
- Marghi Andreassi
- Winnie Xu
- Vivien Song
- Justin Kim
- Amy Liu
Workflow
Scraping
- Run scraper/data_collector.py, which produces lots of csv files with raw data, saving them into /output_csvs
- Run scraper/coallescer.py, which coallesces all the previous files into one file and saves it in the output_from_scraper directory as final_output.csv
Data Processing
- Run data_processing/main.py with the path to the final_output.csv file we received in the last step. This will produce an identical csv with columns for stopwords removed and tokenized text. The result will be in data_processing/processed_output_csvs
Tagging
- we run tagging/assign_tags.py with the assignees flag followed by names of all active collaborators and a num flag of 50. That gets us 50 assigned listings per week. By default, it will take examples from the file we got in the last step.
- each collaborator is assigned a file by the previous process. We normally post those in the discord for everyone to complete
- completed files are posted back to the discord by all collaborators. those are compiled in the tagging/hand_tagged folder
- from there, we run tagging/combine_tags.py to combine all the tagged files together, then tagging/split_data.py to categorize it into train and test datasets
Model
- Once data processing is done, we run ner_model.py. It overwrites ner/spacy_model with a newly trained model
- You can test the current model by running model_validatory.py. It pulls the model saved at ner/spacy_model and tests it against data from ner/test_data.json
Front End
To access and interact with the front-end code, follow below steps:
- In a command line window within the main project directory, enter
cd website
.
- Start the Flask website by entering
python app.py
. Within the CLI instructional output, there should be a line which says Running on http://127.0.0.1:5000/
.
- Copy the URL portion of the message, and paste it into a browser while the process is still running. The home page should now appear.
- Terminate the process anytime with the
CTRL+C
keys, or as specified in the initial CLI instructional output.
TODO
- fix scraper and model (periodic updates required)
- train model on un-tokenized data
- get it to modify a word doc
- finish onboarding new members and set up Kanban board on Github