[ENHANCEMENT] code only works with Linkedin - check out RapidAPI for a Jobs API across platforms

espin086 commented 1 year ago

This is the new API I want to use:

https://rapidapi.com/letscrape-6bRBa3QguO5/api/jsearch/

Here are the steps to update the application:

Update config.py to reference a new table for the updated job search data
Updated SQliteHandler.py - to create a new table based on config metrics
Create new extract.py to get data from new API and save the raw results into the raw data folder
create new transform.py to get transform data and save into the processed folder
create new load.py to load the processed data into the the new database
create a new report.py to query the data to present to the front end.

espin086 commented 10 months ago

This is the structure for each job saved:

{
   "employer_name":"Upwork",
   "employer_logo":"https://image.status.io/z6aeO6kAGsAG.png",
   "employer_website":"http://www.elance.com",
   "employer_company_type":"Computer Services",
   "job_publisher":"Upwork",
   "job_id":"Es-jxFD_o_NOgEqCAAAAAA==",
   "job_employment_type":"CONTRACTOR",
   "job_title":"Data Scientist With knowledge Federated Learning (ML & AI) to consult us business strategy - Contract to Hire",
   "job_apply_link":"https://www.upwork.com/freelance-jobs/apply/Data-Scientist-With-knowledge-Federated-Learning-consult-business-strategy_~01176e79ff5603fc8e/",
   "job_apply_is_direct":true,
   "job_apply_quality_score":0.685,
   "apply_options":[
      {
         "publisher":"Upwork",
         "apply_link":"https://www.upwork.com/freelance-jobs/apply/Data-Scientist-With-knowledge-Federated-Learning-consult-business-strategy_~01176e79ff5603fc8e/",
         "is_direct":true
      }
   ],
   "job_description":"Hello,\n\nWe are in the process of implementing our new startup project, in which we aim to provide businesses with forecasts and measurements through data analysis. In this critical process, we need experienced experts to evaluate the technical aspects in terms of cost and determine the most appropriate budget while prototyping our project.\n\nWhen we consider the technical stages of the project, under the main headings such as data collection, integration, storage, analysis, and user interface design, we examine which technologies we should invest in, the costs of these technologies, the human resources required at the prototype stage, potential additional costs and what we may encounter during the process. We want to identify economic difficulties in advance. We also want to accurately plan the license, hardware and human resource costs that may be required for our project.\n\nWe want to know whether you can support us in this matter. If you do not have the opportunity to provide support in this regard, we would like to get information about which areas you can support us in order to take note of our potential to work together in the future.\n\nCan we discuss these issues in detail at a date that is most convenient for you? We kindly ask you to share your available time slots with us.\n\nThank you in advance for your contribution to cost planning and other issues.\n\nI would be happy if those who can support us in this matter could contact us.\n\nGurkan.",
   "job_is_remote":true,
   "job_posted_at_timestamp":1698134653,
   "job_posted_at_datetime_utc":"2023-10-24T08:04:13.000Z",
   "job_city":"None",
   "job_state":"None",
   "job_country":"US",
   "job_latitude":37.09024,
   "job_longitude":-95.71289,
   "job_benefits":"None",
   "job_google_link":"https://www.google.com/search?gl=us&hl=en&rciv=jb&q=data+scientist&start=0&chips=employment_type:CONTRACTOR&schips=employment_type;CONTRACTOR&ibp=htl;jobs#fpstate=tldetail&htivrt=jobs&htiq=data+scientist&htidocid=Es-jxFD_o_NOgEqCAAAAAA%3D%3D",
   "job_offer_expiration_datetime_utc":"None",
   "job_offer_expiration_timestamp":"None",
   "job_required_experience":{
      "no_experience_required":false,
      "required_experience_in_months":"None",
      "experience_mentioned":true,
      "experience_preferred":false
   },
   "job_required_skills":[
      "Machine Learning Model",
      "Model Optimization",
      "Data Science",
      "TensorFlow",
      "Predictive Analytics",
      "PyTorch",
      "Federated Learning",
      "Econometrics",
      "Community Goals & KPIs"
   ],
   "job_required_education":{
      "postgraduate_degree":false,
      "professional_certification":false,
      "high_school":false,
      "associates_degree":false,
      "bachelors_degree":false,
      "degree_mentioned":false,
      "degree_preferred":false,
      "professional_certification_mentioned":false
   },
   "job_experience_in_place_of_education":false,
   "job_min_salary":"None",
   "job_max_salary":"None",
   "job_salary_currency":"None",
   "job_salary_period":"None",
   "job_highlights":{
      "Qualifications":[
         "We also want to accurately plan the license, hardware and human resource costs that may be required for our project"
      ],
      "Benefits":[
         "Thank you in advance for your contribution to cost planning and other issues"
      ]
   },
   "job_job_title":"Data scientist",
   "job_posting_language":"en",
   "job_onet_soc":"15111100",
   "job_onet_job_zone":"5",
   "job_naics_code":"541511",
   "job_naics_name":"Custom Computer Programming Services"
}

espin086 commented 10 months ago

Progress:

Updated the code to extract data from the new API
Updated extract.py to use the new API and save the new data

Next Step:

Create a new transform.py functionality

espin086 commented 10 months ago

dataTransformer.py has. been updated to process the new data.

TODO: need to create another module called JDNLP.py which will take a job description and performing NLP on it to enrich the data, we won't need to fetch any job descriptions remotely so the code that downloads job descriptions will be depricated.

espin086 commented 10 months ago

The JDNLP.py code has been committed but not tested at all, need to think about moving the class FileReader in this module to the FileHandler.py code as it may be better suited to be there.

espin086 commented 10 months ago

Next is to test and optimize the TextProcessor.py code and make sure it works as expected

ZaibyS commented 7 months ago

@espin086 I explored the response of the new API, we have two keys in JSON response, one is job_apply_link which has one apply link whereas apply_options has all the publishers of that job and their apply links for a job, do we need only one or all the links in the output?

ZaibyS commented 7 months ago

@espin086 Should we include additional metrics in the output from the Jobs API search response, or are the existing metrics from the LinkedIn job search sufficient for our needs?

espin086 commented 7 months ago

@ZaibyS - yes please add all metrics from the New API

espin086 commented 7 months ago

For the job apply link, please try to include the apply Options, all of them @ZaibyS, it is good to know where we can apply for these jobs, so please add them to the SQLite database

espin086 / GPT-Jobhunter

[ENHANCEMENT] code only works with Linkedin - check out RapidAPI for a Jobs API across platforms #7