While testing the proxy capabilities I discovered that the proxy parsing/validation is not quite correct.
Steps to Reproduce
Edit one of the settings files like settings_USA.yaml and uncomment the bottom of it:
# Proxy settings
proxy:
protocol: http # NOTE: you can also set to 'http'
ip: "168.169.146.12"
port: 8080
NOTE: Notice how I removed the single quotes from the port field. This is something I'll fix when I push these fixes(along with unit testing for proxy.py)
The error stack is larger than that, but I'm trying to stay brief.
There are two issues happening here.
The cerberus API does not pass 2 arguments to custom validators, like this function expects:
def _validate_type_ipv4address(self, field, value):
It expects 1. One can see this on the Cerberus API on this snippet of code which is on cerberus/validator.py:
type_handler = self.__get_rule_handler('validate_type', _type)
matched = type_handler(value)
if matched:
return
# TODO uncomment this block on next major release
# when _validate_type_* methods were deprecated:
# type_definition = self.types_mapping[_type]
# if isinstance(value, type_definition.included_types) \
# and not isinstance(value, type_definition.excluded_types): # noqa 501
# return
self._error(field, errors.BAD_TYPE)
self._drop_remaining_rules()
the function type_handler points to _validate_type_ipv4address, and as you can see it only passes 1 argument, value.
Easy fix: make _validate_type_ipv4address take 1 argument.
Once I fixed that, there was another issue:
Traceback (most recent call last):
File "/home/lorenzogomez/.local/bin/funnel", line 11, in <module>
load_entry_point('JobFunnel', 'console_scripts', 'funnel')()
File "/home/lorenzogomez/PycharmProjects/JobFunnel/jobfunnel/__main__.py", line 15, in main
cfg_dict = build_config_dict(args)
File "/home/lorenzogomez/PycharmProjects/JobFunnel/jobfunnel/config/cli.py", line 317, in build_config_dict
raise ValueError(
ValueError: Invalid Config settings yaml:
{'proxy': [{'ip': ['must be of ipv4address type']}]}
This has to do with two functions in the code.
First this function:
class JobFunnelSettingsValidator(Validator):
"""A simple JSON data validator with a custom data type for IPv4 addresses
https://codingnetworker.com/2016/03/validate-json-data-using-cerberus/
"""
def _validate_type_ipv4address(self, value):
"""
checks that the given value is a valid IPv4 address
"""
try:
# try to create an IPv4 address object using the python3 ipaddress
# module
ipaddress.IPv4Address(value)
return True
except:
self._error(value, "Not a valid IPv4 address")
Notice how I added return True at the end of the try statement.
The second function on cli.py:
# Validate the config passed via YAML
if not SettingsValidator.validate(config):
raise ValueError(
f"Invalid Config settings yaml:\n{SettingsValidator.errors}"
)
This fucntion expects a value from _validate_type_ipv4address, but it returns None(at least the current version of it). Hence the new return True in _validate_type_ipv4address.
Hopefully this explanation makes sense.
I will be working on fixing these issues, and will also add unit testing to proxy.py.
Cheers!
Expected behavior
jobfunnel should scrape normally with proxy settings on.
Actual behavior
The errors described above.
Environment
Build: JobFunnel 3.0.1 on commit cf80740a852f478b3a656f612136ddcf841af096
Hi everyone!
Hope you are all doing well.
Description
While testing the proxy capabilities I discovered that the proxy parsing/validation is not quite correct.
Steps to Reproduce
settings_USA.yaml
and uncomment the bottom of it:NOTE: Notice how I removed the single quotes from the
port
field. This is something I'll fix when I push these fixes(along with unit testing for proxy.py)funnel load -s demo/settings_USA.yaml
TypeError: _validate_type_ipv4address() missing 1 required positional argument: 'value'
The error stack is larger than that, but I'm trying to stay brief.
There are two issues happening here.
cerberus
API does not pass 2 arguments to custom validators, like this function expects:def _validate_type_ipv4address(self, field, value):
It expects 1. One can see this on the Cerberus API on this snippet of code which is oncerberus/validator.py
:the function
type_handler
points to_validate_type_ipv4address
, and as you can see it only passes 1 argument,value
.Easy fix: make
_validate_type_ipv4address
take 1 argument.Once I fixed that, there was another issue:
This has to do with two functions in the code.
First this function:
Notice how I added
return True
at the end of the try statement.The second function on
cli.py
:This fucntion expects a value from
_validate_type_ipv4address
, but it returns None(at least the current version of it). Hence the newreturn True
in_validate_type_ipv4address
.Hopefully this explanation makes sense.
I will be working on fixing these issues, and will also add unit testing to
proxy.py
.Cheers!
Expected behavior
jobfunnel
should scrape normally with proxy settings on.Actual behavior
The errors described above.
Environment
JobFunnel 3.0.1
on commit cf80740a852f478b3a656f612136ddcf841af096Pop!_OS 20.04 LTS
Cinnamon