PaulMcInnis / JobFunnel

Scrape job websites into a single spreadsheet with no duplicates.
MIT License
1.83k stars 214 forks source link

TravisCI Time Limit #126

Closed thebigG closed 3 years ago

thebigG commented 3 years ago

Hi everyone.

Description

TravisCI has a time limit of 50 minutes for each job as per their documentation. And as you can see on my last push https://github.com/thebigG/JobFunnel/commit/2417f6e858875539fe6fd8aeee42d5001e01eb82 the build on TravisCI fails because one of the scrape runs hits that 50 minutes cap.

One possible solution I have thought about is adding a number_of_pages/number of jobs limit to each scraper, but this might add complexity to our settings file and Command Line Interface so I'm just posting this issue to start the discussion on possible solutions to tackle this issue.

Another thought that comes to mind is limiting the scraping we do when testing, but I don't think that's optimal because we want our CI pipeline to be reliable and trustworthy, so at the very least we should be doing doing some scraping for every provider, even if it is just one page per provider.

Like I said, just wanted to start the discussion of potential solutions.

Steps to Reproduce

  1. Run TravisCI script

Expected behavior

The tests should run without TravisCI intervention.

Actual behavior

The JobFunnel tests stop and fail because of the time limit.

Environment

PaulMcInnis commented 3 years ago

Unfortunately travis CI also now uses a credits system and I don’t want to pay, so we will eventually run out of them.

thebigG commented 3 years ago

Oh man that makes sense, I thought that I had read something about that the other day. We'll figure something out.

markkvdb commented 3 years ago

Yeah I read the controversy online. Seems like we should just switch to a different provider such as Github Actions or Jenkins.

thebigG commented 3 years ago

I've been investigating Github Actions and it is completely free for open source projects and it is very similar to TravisCI. There is a yaml file that describes the CI process, but the syntax is just slightly different of course. I'll keep on investigating further and give you guys feedback on what I can find.

thebigG commented 3 years ago

UPDATE: I have Github Actions all setup! You can check out the GitHub Actions configuration on this branch:https://github.com/thebigG/JobFunnel/tree/isolate_remote.

I came across a hurdle. Which was, for some reason, JobFunnel will fail to scrape Indeed with max_listing_days set to 35. I set it to 3 days, and everything passed now. @PaulMcInnis @markkvdb Let me know if there is anything you would like changed before I make the PR.

PaulMcInnis commented 3 years ago

Thanks a bunch @thebigG

Lets open that PR and cut an issue for the scraping, since it sounds like this is easily replicated, will make it easier to fix.

thebigG commented 3 years ago

Has been resolved on #127