issues
search
UTMediaCAT
/
mediacat-domain-crawler
Internet domain crawler
0
stars
0
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Add the Internet Archive Domain Crawler
#39
hashimr1
opened
11 months ago
0
Update restart crawl documentation
#38
shawnpl7
closed
1 year ago
0
Update domain crawler restart documentation
#37
shawnpl7
closed
1 year ago
0
Puppeteer crawler generate wrong plain text when crawl through large number of URLs.
#36
CharlesXu123
closed
2 years ago
0
Is Metascraper attempting to bring back dates as part of the domain crawl?
#35
kstapelfeldt
opened
2 years ago
0
Update Apify version and test crawler.
#34
kstapelfeldt
opened
3 years ago
2
Batch crawling
#33
RaiyanRahman
closed
3 years ago
0
Batch crawling
#32
RaiyanRahman
closed
3 years ago
0
make a separate constant file that is git ignored
#31
jacqueline-chan
opened
3 years ago
0
make 5 mini instances and set one up for batching/ full application app
#30
jacqueline-chan
closed
3 years ago
0
README example of how to use commandline to call for the crawler
#29
jacqueline-chan
closed
3 years ago
0
Merge ScopeFix into master
#28
jacqueline-chan
closed
3 years ago
0
Metascraper
#27
jacqueline-chan
closed
3 years ago
0
email when the crawl.js stops
#26
jacqueline-chan
closed
3 years ago
0
bug fix: if the crawl needs to run a subset of its crawl urls, the fu…
#25
jacqueline-chan
closed
3 years ago
0
#6 progress reporting
#24
jacqueline-chan
closed
3 years ago
0
Remount Instance 2 Graham Cloud
#23
jacqueline-chan
opened
3 years ago
1
recover branch #19
#22
jacqueline-chan
closed
3 years ago
0
#16 metascraper integration
#21
AlAndr04
closed
3 years ago
1
Removed large JSON creation.
#20
jacqueline-chan
closed
3 years ago
0
Issue: The crawler is crawling too slow, look for solutions to increase performance
#19
jacqueline-chan
opened
3 years ago
29
Issue: Some domains only return one / a few links back.
#18
jacqueline-chan
closed
3 years ago
5
#2 article plaintext
#17
RaiyanRahman
closed
3 years ago
2
Integrate Metascraper crawl to operate on the Puppeteer crawler output
#16
kstapelfeldt
closed
3 years ago
5
Pulling changes from master.
#15
RaiyanRahman
closed
3 years ago
0
Scope fix
#14
AlAndr04
closed
3 years ago
1
Metascraper
#13
jacqueline-chan
closed
3 years ago
0
#2 article plaintext
#12
RaiyanRahman
closed
3 years ago
0
Modified the filtering function to include twitter
#11
AlAndr04
closed
3 years ago
0
Accepting a .csv file from the parser to populate initial queue
#10
kstapelfeldt
closed
3 years ago
7
Modify filter to permit the storage of twitter URLs
#9
kstapelfeldt
closed
3 years ago
3
Integrate PDF capture into the domain crawler
#8
kstapelfeldt
closed
3 years ago
4
Review integration of metascraper into the domain crawler directly
#7
kstapelfeldt
closed
3 years ago
1
Integrate Date Detection into crawler
#6
kstapelfeldt
closed
3 years ago
7
#2 article plaintext
#5
RaiyanRahman
closed
3 years ago
1
First version of Filtering function
#4
AlAndr04
closed
3 years ago
1
How can we stop the crawler from moving out to domains out of scope
#3
kstapelfeldt
closed
3 years ago
2
Modification of crawler to gather plain text version of the crawled articles
#2
kstapelfeldt
closed
3 years ago
10
Created demo
#1
AlAndr04
closed
4 years ago
0