SeanDaBlack / KelloggBot

Kellogg bad | Union good | Support strike funds
GNU General Public License v3.0
398 stars 81 forks source link

autoscab - generalizing kelloggbot #56

Open sneakers-the-rat opened 2 years ago

sneakers-the-rat commented 2 years ago

love what you've done here. am rearchitecting to make easier to extend for other unions in the future with some prior webscraping code i've written. absolutely LOVE the resume generator. would love to work together, solidarity forever <3

sneakers-the-rat commented 2 years ago

PS i got autoscab on pypi lmao lets do this

https://pypi.org/project/autoscab/ https://github.com/sneakers-the-rat/autoscab

bolshoytoster commented 2 years ago

We could take a dictionary of {xpath: action}, where xpath is a string and action is a class. There could be a base action class:

class Action():
    def __init__(self, fun):
    '''
        Create Action from function/lambda.
        It will be passed the element at xpath, so it must accept one argument.
    '''
        self.fun = lambda _, element: fun(element)

Then have two standard ones:

class Click(Action):
    fun = lambda _, element: element.click()
class Input(Action):
    def __init__(self, inputs):
    '''
        Inputs is either a string, an array of possible strings to choose or a function/lambda that returns the string to use
    '''
        if type(inputs) == list:
            self.fun = lambda _, element: element.send_keys(random.choice(inputs))
        elif type(inputs) == function:
            self.fun = lambda _, element: element.send_keys(inputs())
        else: # Assume it’s either a string or can be cast to a string
            self.fun = lambda _, element: element.send_keys(inputs)

Then have a function that takes the dictionary as input and carries out the actions, perhaps even in a loop:

def autoscab(actions, times=0):
'''
    actions is a dictionary:
        {
         xpath (string),

         action (Action or a class that inherits from it.)
        }

    times is the amount of times to run this, 0 for infinite
'''
    if times == 0:
        iterate = iter(int, 1)
    else:
        iterate = range(times)

    for _ in iterate:
        for xpath, action in actions.items():
            element = driver.find_element_by_xpath(xpath)
            action.fun(element)

This is a rough draft, it probably needs some more error handling but it’s probably best to let the user handle errors. It should work for most applications and to update it you mainly just have to change a dictionary.

bolshoytoster commented 2 years ago

I like the idea, but I'd be worried about that being limiting.

Fair enough, but if you just want to make a quick bot that's probably a good place to start, until people have time to make actual bots. Technically it could also work for purposes other than this but this is the main focus.

It might be better to componentize the project into a set of tools - resume_generator, captcha_solver, email_verifier, etc. - which can be imported into a selenium project and used in-place.

I think this would be a good idea, it also helps people quickly make bots, and it would also work with things other than selenium, if it's a bit too heavy for the particular application (~1mb last time I checked).

sneakers-the-rat commented 2 years ago

yes! this is what i am doing ^^ will post here when i get a draft. Splitting into a bot that can take a set of selectors, an identity class that can do all the faking, and the resume generator with hooks for the identity class to use.

sneakers-the-rat commented 2 years ago

I also think that trying to abstract the process further would take a bit of development time, i'm thinking of a programming interface that would be v familiar/usable by nonprogrammers (click thing, wait, type thing, wait, switch tabs, wait) and then we can do further abstraction depending on patterns that emerge

pws1453 commented 2 years ago

Being able to extract the work done here into either a generalized program or build out some of the subroutines into external libraries would be extremely beneficial. Is there a specific way we'd want to do this?

sneakers-the-rat commented 2 years ago

Being able to extract the work done here into either a generalized program or build out some of the subroutines into external libraries would be extremely beneficial. Is there a specific way we'd want to do this?

Am working on a draft over here, though will need another day to get a full version, sorry to be cryptic: https://github.com/sneakers-the-rat/autoscab

edit have also fixed up the packaging and it's pypi-ready.

sneakers-the-rat commented 2 years ago

OK autoscab 0.2.0 is up now. I'm totally fuzzyheaded right now, but basic organization

The basic pattern is to subclass PostBot with a series of actions to take to fill the form, using the locations in Locator, and then put them in PostBot.apply method -- you can see an example in deployments/fredmeyer . It's a little awkward right now, but trying to get it out to the people in time for it to be useful on my end.

A Deployment consists of a name, list of starting URLs, a Locator dictionary, and a subclasses PostBot. Any Deployment is picked up by the metaclass, so then the calling syntax is just

autoscab <DEPLOYMENT_NAME> 

I tried to leave in place a lot of what was here, but like i said am trying to get this out ASAP and figured we could cohere later. Also haven't pulled in any of y'all work.

usage: APPLY FOR MANY OF THE SAME JOB [-h] [-n N] [--relentless] [--list] [--noheadless] [--leaveopen]
                                      [deployment]

positional arguments:
  deployment    Which deployment to run

optional arguments:
  -h, --help    show this help message and exit
  -n N          Apply for n jobs (default: 1)
  --relentless  Keep applying forever
  --list        List all available deployments and exit
  --noheadless  Show the chromium driver as it fills in the application
  --leaveopen   Try to leave the browser open after an application is completed

IF THEY WANT SCABS, WE'LL GIVE EM SCABS