issues
search
MrDiggles2
/
cru-scrape
Scraper of CRU sites
0
stars
0
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Set up monitoring
#27
MrDiggles2
opened
3 weeks ago
0
Chan/queue
#26
MrDiggles2
closed
3 weeks ago
0
Bump cryptography from 43.0.0 to 43.0.1
#25
dependabot[bot]
opened
3 weeks ago
0
updates spider to not recrawl already visited sites on restart
#24
MrDiggles2
closed
1 month ago
0
Update spider to skip URLs if there's already content stored in DB
#23
MrDiggles2
closed
1 month ago
0
Deploy SQL viz tool somewhere
#22
MrDiggles2
opened
1 month ago
0
Chan/stoppable jobs
#21
MrDiggles2
closed
1 month ago
0
Maintain progress of crawl in case of crappping out
#20
MrDiggles2
closed
1 month ago
1
Queue and workers
#19
MrDiggles2
closed
3 weeks ago
0
consolidates all scripts into commands for cli
#18
MrDiggles2
closed
1 month ago
0
Update README.md
#17
MrDiggles2
closed
1 month ago
0
removes consecutive whitespace in scraped text
#16
MrDiggles2
closed
1 month ago
0
Update crawling script to replace consecutive instances of whitespace with a single space.
#15
MrDiggles2
closed
1 month ago
1
Chan/update pipeline to upload to db
#14
MrDiggles2
closed
1 month ago
0
trim off landing pages
#13
danielahuang
closed
1 month ago
0
Update `upload-organizations.py` to trim off "landing pages" from URL
#12
MrDiggles2
closed
1 month ago
1
Add a script to spit out all combinations of URL and year required
#11
MrDiggles2
closed
3 weeks ago
1
Adds sql structure for all organizations
#10
MrDiggles2
closed
1 month ago
0
cli-refactor-with-typer
#9
shvets92
closed
2 months ago
6
Migrate from venv to poetry
#8
MrDiggles2
closed
2 months ago
1
switching to poetry for env control
#7
shvets92
closed
2 months ago
0
Stop storing creds in repo
#6
MrDiggles2
closed
2 months ago
1
Update pipeline upload data to DB instead of writing to file
#5
MrDiggles2
closed
1 month ago
1
Make year+url unique index
#4
MrDiggles2
closed
1 month ago
2
Figure out why some pages aren't getting crawled
#3
MrDiggles2
closed
1 month ago
2
Make this a CLI tool instead
#2
MrDiggles2
closed
2 months ago
2
POC: Handling PDF pages
#1
MrDiggles2
opened
2 months ago
0