Yashdew / Assessor

An open-source Resume Analyzer and Ranking tool for recruiters and candidates.
http://assessor.vercel.app/
MIT License
18 stars 8 forks source link

Fix Parsing Issue #25

Open SkSumit opened 2 years ago

SkSumit commented 2 years ago
the-lightstack commented 2 years ago

How can I reproduce that error? (so what was your input pdf)

SkSumit commented 2 years ago

Hi @the-lightstack , You can checkout the sample resume folders to test it out, for these resumes code does not parse information well. For example this resume, outputs name as Frontend Intern instead of Sumit Kolpekwar, along with other issues.

Yashdew commented 2 years ago

@the-lightstack for reference you can check the JSON output example in README.md and One more thing kindly PR in dev branch.

the-lightstack commented 2 years ago

This seems interesting, but I don't really know what the error is, so I won't fix this issue

Yashdew commented 2 years ago

@the-lightstack well there is no issue but we want to increase the accuracy of the parsing algorithm.

For example, In education:- nothing is coming in most of the resumes. In experience:- It's can't differentiate between projects and experience.

File location:- https://github.com/Yashdew/Assessor/blob/main/Sample%20Resume/Yash-Dewangan-CV.pdf

At present we are getting this type of JSON from parsing:-

[
    {
        "personal_details": {
            "name": "Yash Dewangan",
            "email": "yashdewangan123456@gmail.com",
            "mobile_number": "8602842290"
        },
        "skills": [
            "Github",
            "Architecture",
            "Programming",
            "Pandas",
            "Editing",
            "Database",
            "Analysis",
            "Design",
            "Apis",
            "Ui",
            "C",
            "Coding",
            "C++",
            "Video",
            "Engineering",
            "Information technology",
            "Api",
            "Algorithms",
            "Java",
            "Rest",
            "Statistics",
            "Flask",
            "Django",
            "Apex",
            "Sql server",
            "Js",
            "Photography",
            "Css",
            "Python",
            "Html",
            "Sql"
        ],
        "education": null,
        "experience": [
            "eQ Technologic  | Software Engineer Intern",
            "Aug 2021 – Present",
            "•  Implemented various services/APIs needed",
            "for new features required in latest release",
            "•  Learnt about SOA architecture, modular",
            "coding i.e. keeping future use in mind",
            "•  Implementation of concepts such as Tagging",
            "Entities, Groups/User Authorization &",
            "Permissions for Entities",
            "•  Worked on Backend technologies such as",
            "Spring and Java with SQL Server as",
            "Database"
        ],
        "no_of_pages": 1,
        "links": {
            "linkedin": "https://www.linkedin.com/in/iyashdewangan/",
            "leetcode": "https://leetcode.com/Yashdew/",
            "codechef": "https://www.codechef.com/users/yashdew",
            "codeforces": "http://codeforces.com/profile/yashdewangan123456",
            "github": [
                "https://github.com/Yashdew/Attendance-Tracker",
                "https://github.com/Yashdew",
                "https://github.com/SkSumit/Chatistics"
            ],
            "others": [
                "https://drive.google.com/file/d/1-UrtlUygeujyDXvZPhI5fW9E1wICL_Qd/view",
                "https://chatistics.vercel.app/",
                "https://auth.geeksforgeeks.org/user/yashdewangan123456/practice/",
                "https://attendancesknhc.herokuapp.com/",
                "mailto:yashdewangan123456@gmail.com",
                "https://www.spoj.com/users/yashdew/"
            ]
        },
        "total_experience": 0.17,
        "projects": "Projects TBC",
        "achievements": "Achievements TBC",
        "hobbies": "Hobbies TBC"
    }
]

We want this type of JSON from parsing algorithm:-

[
    {
        "personal_details": {
            "name": "Yash Dewangan",
            "email": "yashdewangan123456@gmail.com",
            "mobile_number": "8602842290"
        },
        "skills": [
            "Pandas",
            "Coding",
            "C",
            "Flask",
            "Css",
            "Java",
            "C++",
            "Django",
            "Rest",
        ],
        "education": [
            "SMT. KASHIBAI NAVALE COLLEGE OF ENGINEERING
            BE in Information Technology
            2018-2022 | Pune, MH
            Cum. GPA: 8.14",
        ],
        "experience": [
            "eQ Technologic | Software Engineer Intern
            Aug 2021 – Present
            Implemented various services/APIs needed for new features required in latest release
            Learnt about SOA architecture, modular coding i.e. keeping future use in mind
            Implementation of concepts such as Tagging Entities and  Groups/User Authorization & Permissions for Entities
            Worked on Backend technologies such as Spring and Java with SQL Server as Database"
        ],
        "no_of_pages": 1,
        "links": {
            "linkedin": "https://www.linkedin.com/in/iyashdewangan/",
            "leetcode": "https://leetcode.com/Yashdew/",
            "codechef": "https://www.codechef.com/users/yashdew",
            "codeforces": "http://codeforces.com/profile/yashdewangan123456",
            "github": [
                "https://github.com/Yashdew/Attendance-Tracker",
                "https://github.com/Yashdew",
                "https://github.com/SkSumit/Chatistics"
            ],
            "others": [
                "mailto:yashdewangan123456@gmail.com",
                "https://www.spoj.com/users/yashdew/",
                "https://attendancesknhc.herokuapp.com/",
                "https://chatistics.vercel.app/",
                "https://auth.geeksforgeeks.org/user/yashdewangan123456/practice/"
            ]
        },
        "total_experience": 0.17,
        "projects": [
            "CHATISTICS
            GitHub Live URL
            Dec 2020 - Feb 2021
            An open-source WhatsApp chats analyser and statistics.
            Application, which provides various meaningful insights.
            Time complexity reduces from 20 seconds. to 5 seconds.
            Used Flask for implementing backend REST APIs with firebase database for analysis of traffic.
            Pandas for data pre-processing.
            Used NextJS and Bulma UI for frontend.
            500+ users and 30 stars on GitHub.",

            "ATTENDANCE-TRACKER
            GitHub Live URL
            July 2020 – Aug 2020
            A full-stack web application for monitoring the attendance in Microsoft Teams from logs file of the meeting. (Sample)
            Optimization of code took around 3 seconds in Data pre-processing.
            Worked on building the major backend part and frontend.
            Used Flask for implementing Backend and HTML, CSS & JS for frontend.
            Used Mongo DB and Google sheet API for Database.
            Data pre-processing of large logs files for calculating time stamps of students using pandas
            50+ users in our college."
        ],
        "achievements": [
            "Codechef - Maximum rating 1603 (3-star).",
            "Codechef – March Lunchtime 2021 Div-3, secured a rank of 825 out of 7000+ participants.",
            "Leetcode – 150+ Solved Questions.",
            "250+ Solved Questions on GFG, Codechef, SPOJ and Codeforces.",
            "Participated in Google kickstart 2021 Round A, Round C & Round D.",
            "Secured 1st rank out of 30+ participants in Scaler Edge Apex 2021. (SKN Edition)",
            "Represented Hack Club SKN projects in Hack Club Asia Summit 2021.",
            "Participated in more than 30+ coding competition."
        ],
        "hobbies": [
            "Photography and Video editing",
            "Traveling and exploring new places.",
            "Gaming"
        ]
    }
]
AK9175 commented 2 years ago

This seems interesting, but I don't really know what the error is, so I won't fix this issue

We used a module called pyresparser to extract information from resumes but unfortunately, we are missing out on few attributes like we want to get all the information regarding what projects, achievements a particular candidate has done so now we are expecting that you can use some other known module or you can work out that on your own to extracts projects and achievements from the resume

- If working it out on your own

  1. You need to use some pdf to text module to get text from resumes.
  2. Use your Data preprocessing / extraction techniques to extract Experience , Projects ,Achievements for resumes with different structures(some resumes are divided horizontally and some are vertically or if any)

- If using some modules

  1. Use another module other than pyresparser to extract data which will accurately list out Experience , Projects and Achievements.