issues
search
cldellow
/
datasette-scraper
Add website scraping abilities to Datasette
Apache License 2.0
60
stars
1
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
500 Error with dss_crawl and a few other tables after installing
#53
unbracketed
opened
5 months ago
0
javascript interaction?
#52
ar-jan
opened
10 months ago
0
table overload: consider hiding tables and using views or other affordances
#51
cldellow
closed
1 year ago
0
usability: if discover-urls returns a str, raise an exception
#50
cldellow
opened
1 year ago
0
dss fails when its not the primary database
#49
cldellow
closed
1 year ago
0
permit installing into a database with pre-existing tables
#48
cldellow
closed
1 year ago
0
link dss_extract_stats table column to actual table
#47
cldellow
closed
1 year ago
0
link dss_job_stats hostname column to dss_crawl_queue_history
#46
cldellow
closed
1 year ago
0
make ops db configurable
#45
cldellow
closed
1 year ago
1
be able to train dictionaries automatically
#44
cldellow
closed
1 year ago
1
be able to rewrite existing fetch_cache entries
#43
cldellow
closed
1 year ago
0
crawl_queue_history should have fkey to fetch_cache
#42
cldellow
closed
1 year ago
0
take over row page for _dss_crawl
#41
cldellow
closed
1 year ago
0
take over table page for _dss_crawl table
#40
cldellow
closed
1 year ago
0
consider setting default sort orders
#39
cldellow
closed
1 year ago
0
be able to edit host rates in datasette
#38
cldellow
closed
1 year ago
2
support zstd dictionaries
#37
cldellow
closed
1 year ago
0
consider: re-insert (or update) a crawl item if has a lower depth
#36
cldellow
closed
1 year ago
1
be more aggressive in asserting Wal mode
#35
cldellow
closed
1 year ago
0
Have a way to make an HTTP request that goes through the plugin system
#34
cldellow
closed
1 year ago
1
hook: config_schema, config_default_values
#33
cldellow
closed
1 year ago
1
hook: extract_from_response
#32
cldellow
closed
1 year ago
1
hook: canonicalize_urls
#31
cldellow
closed
1 year ago
0
hook: discover_urls
#30
cldellow
closed
1 year ago
0
hook: fetch_url
#29
cldellow
closed
1 year ago
0
hook: before_fetch_url
#28
cldellow
closed
1 year ago
0
hook: get_seed_urls
#27
cldellow
closed
1 year ago
0
plugin: extract-object
#26
cldellow
opened
1 year ago
0
plugin: extract-ecommerce-shopify
#25
cldellow
opened
1 year ago
0
plugin: extract-ecommerce-json-ld
#24
cldellow
closed
1 year ago
0
plugin: extract-links
#23
cldellow
closed
1 year ago
1
plugin: extract-seo
#22
cldellow
opened
1 year ago
0
plugin: fetch-cache
#21
cldellow
closed
1 year ago
1
plugins: max-pages, max-pages-per-domain
#20
cldellow
closed
1 year ago
0
plugin: max-depth
#19
cldellow
closed
1 year ago
0
plugin: canonicalize-shopify-urls
#18
cldellow
closed
1 year ago
0
plugin: discover-allow
#17
cldellow
closed
1 year ago
1
plugin: discover-deny
#16
cldellow
closed
1 year ago
0
plugin: discover-only-same-origin
#15
cldellow
closed
1 year ago
0
plugin: discover-redirect-urls
#14
cldellow
closed
1 year ago
0
plugin: discover-html-urls
#13
cldellow
closed
1 year ago
0
plugin: seed-sitemaps
#12
cldellow
closed
1 year ago
0
plugin: seed-urls
#11
cldellow
closed
1 year ago
0
support scheduled crawls
#10
cldellow
opened
1 year ago
1
make plugins pluggable
#9
cldellow
closed
1 year ago
0
add plugin system
#8
cldellow
closed
1 year ago
0
add background workers that Do The Thing
#7
cldellow
closed
1 year ago
5
consider dashboards
#6
cldellow
opened
1 year ago
0
consider a knob to hide _dss_ tables
#5
cldellow
closed
1 year ago
1
throw if the default database is _memory, or isn't mutable
#4
cldellow
closed
1 year ago
0
Next