apify / crawlee-python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
https://crawlee.dev/python/
Apache License 2.0
4.23k stars 295 forks source link

Add package for run-time type checking #17

Open vdusek opened 9 months ago

vdusek commented 9 months ago

Description

Based on the PR https://github.com/apify/apify-sdk-python/pull/171, @janbuchar suggested the usage of some run-time checking for Python.

E.g. typeguard, it can be applied either using a decorator @typechecked for a specific function or import hook typeguard.install_import_hook() for the whole module.

For some methods/functions where we check manually the type of args/return type it could make sense to use it. E.g. here https://github.com/apify/apify-sdk-python/blob/v1.5.1/src/apify/scrapy/utils.py#L44.

Potential problems

I suppose it is implemented by using typing.get_type_hints for getting the type hints for a specific function. I run into a bug when typing.get_type_hints and from __future__ import annotations are used together, see the issue https://github.com/apify/apify-sdk-python/issues/151. However, tests should reveal it.

janbuchar commented 8 months ago

https://pypi.org/project/beartype/ is another option

janbuchar commented 3 months ago

https://docs.pydantic.dev/latest/concepts/validation_decorator/ is also an option