UTMediaCAT mediacat-domain-crawler issues

UTMediaCAT / mediacat-domain-crawler

Internet domain crawler

0 stars 0 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Add the Internet Archive Domain Crawler

#39 hashimr1 opened 11 months ago
0
Update restart crawl documentation

#38 shawnpl7 closed 1 year ago
0
Update domain crawler restart documentation

#37 shawnpl7 closed 1 year ago
0
Puppeteer crawler generate wrong plain text when crawl through large number of URLs.

#36 CharlesXu123 closed 2 years ago
0
Is Metascraper attempting to bring back dates as part of the domain crawl?

#35 kstapelfeldt opened 2 years ago
0
Update Apify version and test crawler.

#34 kstapelfeldt opened 3 years ago
2
Batch crawling

#33 RaiyanRahman closed 3 years ago
0
Batch crawling

#32 RaiyanRahman closed 3 years ago
0
make a separate constant file that is git ignored

#31 jacqueline-chan opened 3 years ago
0
make 5 mini instances and set one up for batching/ full application app

#30 jacqueline-chan closed 3 years ago
0
README example of how to use commandline to call for the crawler

#29 jacqueline-chan closed 3 years ago
0
Merge ScopeFix into master

#28 jacqueline-chan closed 3 years ago
0
Metascraper

#27 jacqueline-chan closed 3 years ago
0
email when the crawl.js stops

#26 jacqueline-chan closed 3 years ago
0
bug fix: if the crawl needs to run a subset of its crawl urls, the fu…

#25 jacqueline-chan closed 3 years ago
0
#6 progress reporting

#24 jacqueline-chan closed 3 years ago
0
Remount Instance 2 Graham Cloud

#23 jacqueline-chan opened 3 years ago
1
recover branch #19

#22 jacqueline-chan closed 3 years ago
0
#16 metascraper integration

#21 AlAndr04 closed 3 years ago
1
Removed large JSON creation.

#20 jacqueline-chan closed 3 years ago
0
Issue: The crawler is crawling too slow, look for solutions to increase performance

#19 jacqueline-chan opened 3 years ago
29
Issue: Some domains only return one / a few links back.

#18 jacqueline-chan closed 3 years ago
5
#2 article plaintext

#17 RaiyanRahman closed 3 years ago
2
Integrate Metascraper crawl to operate on the Puppeteer crawler output

#16 kstapelfeldt closed 3 years ago
5
Pulling changes from master.

#15 RaiyanRahman closed 3 years ago
0
Scope fix

#14 AlAndr04 closed 3 years ago
1
Metascraper

#13 jacqueline-chan closed 3 years ago
0
#2 article plaintext

#12 RaiyanRahman closed 3 years ago
0
Modified the filtering function to include twitter

#11 AlAndr04 closed 3 years ago
0
Accepting a .csv file from the parser to populate initial queue

#10 kstapelfeldt closed 3 years ago
7
Modify filter to permit the storage of twitter URLs

#9 kstapelfeldt closed 3 years ago
3
Integrate PDF capture into the domain crawler

#8 kstapelfeldt closed 3 years ago
4
Review integration of metascraper into the domain crawler directly

#7 kstapelfeldt closed 3 years ago
1
Integrate Date Detection into crawler

#6 kstapelfeldt closed 3 years ago
7
#2 article plaintext

#5 RaiyanRahman closed 3 years ago
1
First version of Filtering function

#4 AlAndr04 closed 3 years ago
1
How can we stop the crawler from moving out to domains out of scope

#3 kstapelfeldt closed 3 years ago
2
Modification of crawler to gather plain text version of the crawled articles

#2 kstapelfeldt closed 3 years ago
10
Created demo

#1 AlAndr04 closed 4 years ago
0