PaulMcInnis / JobFunnel

Scrape job websites into a single spreadsheet with no duplicates.
MIT License
1.81k stars 212 forks source link

Parse config #54

Closed markkvdb closed 4 years ago

markkvdb commented 4 years ago

Parse checker and validation

Description

This pull request introduces some major changes to the parser, while also introducing preparsing and validation.

Setting values in config

Old method

The parse_config function has been largely rewritten. In previous version, most attributes of the configuration file are handled separately where we first set the value to what we found in default_yaml, then we checked if it was defined in given_yaml and update accordingly and lastly we check if it was defined as an argument to the function and updated accordingly. In short, we first assume the value for default_yaml, then update values defined in given_yaml and lastly update values defined in the arguments of the program.

New method

In this pull request, I don't do every attribute separately. Instead, config is set to default_yaml, then config is updated by non-None values of given_yaml and lastly updated by cli_yaml. cli_yaml is created by putting the program arguments in a dictionary in the same style as default_yaml.

Preparse config

After obtaining config by the procedure defined above, we check whether config consists of valid attributes with the right types. This is done by calling the check_config_types function. This function checks whether all attributes are defined in the new CONFIG_TYPES containing all valid attributes and its possible types.

Note to developers wanting to introduce new attributes to configuration file: you do have to define this attribute in the CONFIG_TYPES dictionary with its possible types. If you do not do this, you will notice soon enough because the tests will fail.

Validate config

After checking that all attributes are valid and have the right type, we can check whether these attributes have sensible values. validate_config checks whether the domain is in the supported DOMAINS list, that US and Canadian jobs have a province and many more tests.

Context of change

Please add options that are relevant and mark any boxes that apply.

Type of change

Please mark any boxes that apply.

How Has This Been Tested?

Several new tests have been added to test the preparser, validator and updates to parse_config. These tests fall in the categories:

Checklist:

Please mark any boxes that have been completed.

markkvdb commented 4 years ago

@bunsenmurder @PaulMcInnis in the last commit 8fdef2a7233948cd96912c4c6b397daa5f558947 I removed the functionality to turn off the delay from the entire code base.

The minor fixes have also been dealt with in 1e87c72.

PaulMcInnis commented 4 years ago

Good. I agree with this functionality change, I think its the responsible thing to do.

The default delay is quite reasonable too.