issues
search
DataCrawl-AI
/
datacrawl
A simple and easy to use web crawler for Python
MIT License
58
stars
11
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Retry mechanism for transient errors
#49
Mews
closed
2 months ago
0
Use settings classes
#48
Mews
closed
2 months ago
0
Deleted src/tiny_web_crawler/spider.py
#47
Mews
closed
2 months ago
0
Use one or more `Options` classes
#46
Mews
closed
2 months ago
1
Respect robots txt
#45
Mews
closed
2 months ago
0
Make `Spider` importable from root module
#44
Mews
closed
2 months ago
0
Feature: Logging
#43
Mews
closed
2 months ago
2
Respect robots.txt when crawling when set as True
#42
indrajithi
closed
2 months ago
1
Can't run tests on local machine
#41
Mews
closed
2 months ago
1
Add `internal_links_only` and `external_links_only` options
#40
Mews
closed
2 months ago
0
Implement a retry mechanism for transient errors
#39
indrajithi
closed
2 months ago
0
Implement logging
#38
indrajithi
closed
2 months ago
1
Crawl depth per domain
#37
indrajithi
opened
2 months ago
0
Refactored spider class into modules
#34
indrajithi
closed
2 months ago
0
Removed unused `main` function
#33
Mews
closed
2 months ago
2
Coverage report in readme
#32
Mews
closed
2 months ago
3
Fix small typo in verbose print statement
#31
Mews
closed
2 months ago
3
Display coverage percentage in readme
#30
Mews
closed
2 months ago
2
Increased test coverage
#29
Mews
closed
2 months ago
2
Test coverage above 80%
#28
Mews
closed
2 months ago
0
What is the `main` function for?
#27
Mews
closed
2 months ago
2
Make `Spider` importable from main module
#26
Mews
closed
2 months ago
1
Set tests stage to push
#25
Mews
closed
2 months ago
0
First Major Release v1.0.0
#24
indrajithi
opened
2 months ago
3
`poetry install --with dev` doesn't install pre-commit hooks
#23
Mews
opened
2 months ago
6
Add mypy and pytest to pre-commit
#22
Mews
closed
2 months ago
0
Fix `crawl_result` type hint
#21
Mews
opened
2 months ago
1
Add mypy to pre-commit hooks
#20
Mews
closed
2 months ago
6
Added option to include page body in crawl results
#19
Mews
closed
2 months ago
7
Docs: Auto generate documentation
#18
indrajithi
opened
2 months ago
0
Housekeeping: Refactor the code base to a more Modular and Extensible Architecture
#17
indrajithi
closed
2 months ago
0
Url regex matching
#16
Mews
closed
2 months ago
2
Feature/refactor spider class
#15
indrajithi
closed
2 months ago
0
Add a flag to crawl only the root website
#14
devavinothm
closed
2 months ago
2
Feature: Support for regular expression pattern for url crawling
#13
indrajithi
closed
2 months ago
5
Feature: Add a feature to only crawl the given list of urls
#12
indrajithi
opened
2 months ago
5
Feature: Support flag to crawl only the root website. Do not hop to external links
#11
indrajithi
closed
2 months ago
10
Feature: Support for crawling dynamic javascript heavy site
#10
indrajithi
opened
2 months ago
5
Housekeeping: Refactor the Spider class to reduce the max args. Use a dataclass
#9
indrajithi
closed
2 months ago
1
Feature: Add option to return the crawled website body in the response
#8
indrajithi
closed
2 months ago
3
add support for concurrent workers, custom delay and optional verbose
#7
indrajithi
closed
2 months ago
0
use pottery to publish
#6
indrajithi
closed
2 months ago
0
Feature/test workflow
#5
indrajithi
closed
2 months ago
0
python publish workflow
#4
indrajithi
closed
2 months ago
0
Create python-publish.yml
#3
indrajithi
closed
2 months ago
0
Update pylint.yml
#2
indrajithi
closed
2 months ago
0
Feature: Basic features and package the project
#1
indrajithi
closed
2 months ago
0