Browser closes itself and errors

ArshKA / LinkedIn-Job-Scraper

LinkedIn scraper to retrieve and store a live stream of job postings

93 stars 28 forks source link

Browser closes itself and errors #1

Closed gideon-teo closed 11 months ago

gideon-teo commented 11 months ago

When running search_retriever.py, A browser is opened and it logins automatically. A few seconds after logging in, the program prompts me to press enter to continue. After pressing enter, the browser loads the linked job search page, and then closes itself.

Traceback (most recent call last):
  File "LinkedIn-Job-Scraper/search_retriever.py", line 24, in <module>
    all_results = job_searcher.get_jobs()
  File "LinkedIn-Job-Scraper/fetch.py", line 62, in get_jobs
    raise Exception('Status code {} for search\nText: {}'.format(results.status_code, results.text))
Exception: Status code 400 for search

Would you mind helping me with this please? Thank you.

ArshKA commented 11 months ago

The browser closing process after signing in, is expected behavior. LinkedIn’s login doesn’t return a crf token unless the JavaScript is run. It should then transfer those cookies from the selenium driver to the requests session for future API calls. Did you add usernames and passwords into the code, and did the browser take you to the account’s job search page before it exited?

gideon-teo commented 11 months ago

Thanks for your response! I've added just 1 username and password into the code, and it did login successfully, and the browser opened the job search page before exiting, so all of that sounds like it's working as intended. The error only appeared after the browser window closed. I'm wondering what has gone wrong then? Does the error give any clue? Thanks again.

AndhikaWB commented 11 months ago

@ArshKA I got the same error, I tried to replicate it by modifying the request header to include csrf-token and open it directly on my browser (without Selenium). I tried deleting part of the query URL (https://www.linkedin.com/voyager/api/search/hits?XXXX) but it always return 400 (only do a few tests though).

I'm not sure where to get info about the right query string for the API, any help or pointer? When I open https://www.linkedin.com/jobs/search/? and inspect it using developer tools, but it seems there's no request point to voyager/api/search/hits, so which page/URL do you use that is making that API request? Is there specific search filter you use? I might be able to help if I can replicate it.

See screenshot below:

ArshKA commented 11 months ago

@AndhikaWB @gideon-teo I've just rolled out an update to the script, and it should resolve the issue you've encountered, as well as introduce a handful of new features to enhance usability. Recently, LinkedIn altered their search endpoint, resulting in the 400 Bad Request error, as is the challenge when scraping well-established websites with valuable info. Since this program has garnered a little attention, I'm planning to release another update over the weekend that will focus on refining the code and providing a more comprehensive readme. In the meantime, please don't hesitate to reach out if you encounter any other issues or if you have suggestions for new features. Your feedback is greatly appreciated!

ArshKA commented 11 months ago

Also @AndhikaWB, I do believe the new script utilizes the request url shown in your screenshot, so thank you