Sh1Fu / Netlab

A simple python script for working with Netlab API
2 stars 0 forks source link

Proxy settings #12

Closed Sh1Fu closed 1 year ago

Sh1Fu commented 1 year ago

Need to figure out what to do about the proxy, because it's currently downloading without it, but there's an extra block of code that slows down the script

# Netlab_DownloadImages.py -> xlsx_work
if i % mx_del == 0:
    proxy_index = randint(0, len(self.PROXY_LIST) - 1)
    current_proxy = self.PROXY_LIST[proxy_index]
    proxy_dict = {"http": current_proxy}

Furthermore, there are a few things that are also needed to work with proxies, but are not used: finding the max divisor, the scrap_proxy function, etc.

A few options:

Optional: See how scrap_proxy() can be improved

def scrap_proxy(self) -> list:
        '''
        Take all free proxy servers from free proxy list
        Resources:\n
        * free-proxy-list.net
        * httpstatus.io

        Check if response to http proxy has status code 200 and append it to clean_proxy list
        '''
        proxy_list, clean_proxy = list(), list()
        response = get('https://free-proxy-list.net/')
        proxy_table = BeautifulSoup(response.text, 'html.parser').find('table')
        proxy_html_raw = proxy_table.find_all("tr")
        for proxy_row in proxy_html_raw:
            td_tag = proxy_row.find_all('td')
            if len(td_tag) == 8:
                proxy_list.append(td_tag[0].get_text() + ":" + td_tag[1].get_text()) if proxy_row.find_all(
                    class_="hx")[0].get_text() == "no" and len(proxy_list) < 100 else None
        urls = ", ".join('"http://' + proxy + '"' for proxy in proxy_list)
        payload = "{\
                    \"urls\":[%s],\"userAgent\":\"chrome-100\",\
                    \"userName\":\"\",\"passWord\":\"\",\
                    \"headerName\":\"\",\"headerValue\":\"\",\
                    \"strictSSL\":true,\"canonicalDomain\":false,\"\
                    additionalSubdomains\":[\"www\"],\"followRedirect\":false,\
                    \"throttleRequests\":100,\"escapeCharacters\":false\
                    }" % urls
        test_proxies = post(
            "https://backend.httpstatus.io/api", json=loads(payload))
        test_data = test_proxies.json()
        for index, bad_proxy in enumerate(test_data, 0):
            if bad_proxy["statusCode"] == 200:
                clean_proxy.append(proxy_list[index])
        return clean_proxy
Sh1Fu commented 1 year ago

How about just leaving one possible proxy, just checking it before using it to see if it works? Because Netlab will quietly keep one IP alive

Sh1Fu commented 1 year ago

Trouble: The proxy cannot get up because of two errors. ProxyError and ConnectionTimeout

Sh1Fu commented 1 year ago
<hr>
<div id="footer">
<p>Generado Sat, 05 Nov 2022 14:05:00 GMT por mail.blog2life.net (squid)</p>
<!-- ERR_ACCESS_DENIED -->
</div>
</body></html>

Why, Netlab? :<

image