issues
search
bgabor99
/
News_crawler
0
stars
0
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Scheduled crawling
#56
bgabor99
closed
11 months ago
0
Research for presentation
#55
bgabor99
closed
10 months ago
0
Run the app every half an hour
#54
bgabor99
closed
11 months ago
2
Domains from file input
#53
bgabor99
opened
1 year ago
0
Article id filter fix
#52
bgabor99
closed
1 year ago
0
Add new domains from file and save everything from those
#51
bgabor99
opened
1 year ago
2
Write more tests
#50
bgabor99
opened
1 year ago
1
Add .env file
#49
bgabor99
closed
1 year ago
0
Get user and password from .env file in 1.sql too
#48
bgabor99
opened
1 year ago
0
Add server config for pgadmin
#47
bgabor99
opened
1 year ago
1
Create presentation
#46
bgabor99
closed
10 months ago
5
Crawl thehackernews site too
#45
bgabor99
closed
1 year ago
0
Check for more filter options in spiders
#44
bgabor99
opened
1 year ago
0
Check Article ID handling
#43
bgabor99
opened
1 year ago
0
Log file is neccessary? Check
#42
bgabor99
closed
11 months ago
2
Add ci with pycodestyle
#41
bgabor99
closed
1 year ago
0
Investigate deep web crawling possibilities
#40
bgabor99
closed
1 year ago
2
Docker compose
#39
bgabor99
closed
1 year ago
0
Update readme with docker compose
#38
bgabor99
closed
1 year ago
0
Rework env variables in docker-compose
#37
bgabor99
closed
1 year ago
1
Add readme for docker compose usage
#36
bgabor99
closed
1 year ago
1
Crawl all pages in one spider in cybersecurity.com domain
#35
bgabor99
closed
1 year ago
0
Crawl all pages
#34
bgabor99
closed
1 year ago
0
Crawl new domain
#33
bgabor99
closed
1 year ago
2
Logging initial
#32
bgabor99
closed
1 year ago
0
Only save new articles
#31
bgabor99
closed
1 year ago
0
Datetime with timezone for common."Date"
#30
bgabor99
opened
1 year ago
0
Add CI with pycodestyle
#29
bgabor99
closed
1 year ago
1
Check load more and multiple pages on a specific page
#28
bgabor99
opened
1 year ago
0
Save article author and date
#27
bgabor99
closed
1 year ago
0
Article author save
#26
bgabor99
closed
1 year ago
1
Added restart for serials
#25
bgabor99
closed
1 year ago
0
Crawl all articles from all pages in this domain: https://cybersecuritynews.com/
#24
bgabor99
closed
1 year ago
1
Database rebase
#23
bgabor99
closed
1 year ago
0
Insert NULL instead of None
#22
bgabor99
opened
1 year ago
0
Check content xpath
#21
bgabor99
opened
1 year ago
0
Add logging for spiders
#20
bgabor99
closed
1 year ago
1
Database NULL attributes change
#19
bgabor99
closed
1 year ago
1
Database scheme and tables overview, change if neccessary
#18
bgabor99
closed
1 year ago
2
Check universality for body crawling investigate
#17
bgabor99
closed
1 year ago
1
Article date save
#16
bgabor99
closed
1 year ago
1
Database snapshot creation investigate
#15
bgabor99
closed
1 year ago
1
Threat news spider added
#14
bgabor99
closed
1 year ago
0
Restart IDs seq when truncate cascade in db
#13
bgabor99
closed
1 year ago
1
Create runner script
#12
bgabor99
closed
1 year ago
1
Crawl all pages in the domain in one spider
#11
bgabor99
closed
1 year ago
1
Save content from articles too
#10
bgabor99
closed
1 year ago
1
Only save new articles from pipeline
#9
bgabor99
closed
1 year ago
1
Set up docker-compose
#8
bgabor99
closed
1 year ago
1
Check article IDs
#7
bgabor99
opened
1 year ago
0
Next