web-crawlers Search Results

1000+ results
for web-crawlers

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

liuyifan6613/DocBank-Document-Enhancement-Dataset #1

about dataset

Thank you for sharing. I would like to ask where your data comes from.

daeing updated 1 month ago
5
rumca-js/Django-link-archive #36

How to correctly update title and description of entry

Some link titles and descriptions cannot be correctly obtains. This is because of cloudflare, and other protection mechanisms. Some entries were edited manually. If page changes drastically. For e…

rumca-js updated 1 month ago
1
ActoKids/AD440_W19_CloudPracticum #99

Split the web crawlers, Shadow Seals and OFA

The webcrawlers have been merged on to the EC2, however the Shadow Seals crawler does not require an EC2. Therefore, it should be split from OFA, and the EC2, then moved over to it's own lambda. Pl…

mrvirus9898 updated 5 years ago
3
leopardslab/CrawlerX #41

Feature: Add some new crawlers for popular web pages

Add crawl spiders for the following or popular websites. - Youtube - Quora - Facebook - Reddit - GitHub Currently implemented spiders can be found in - https://github.com/leopardslab/CrawlerX/…

sajithaliyanage updated 3 years ago
1
codesupport/website-frontend #127

Multiple H1s on Page Due to Articles

Due to articles using MD, there is an obvious contextual disconnect between what is in the MD and what is actually rendered on the page. An article can container a markdown heading level 1, but the pa…

nox7 updated 1 month ago
1
daijro/CourseHeroUnblur #5

Possible way to make a continuation

I realized that when you search for text in the coursehero document on google, it has the problem and answer in the page description. That means coursehero has a text version of the pdf openly availab…

7ih updated 1 month ago
4
NASA-PDS/web-analytics #5

Improve Robot detection in PDS analytics data pipeline

## 💡 Description Current bot detection routine is fairly basic and rule based. Create a more complete solution to detect web crawlers and bot interaction with PDS Nodes.

kaipak updated 3 months ago
4
whatwg/html #10277

Proposal: AI Task Meta Tag

### What problem are you trying to solve? Currently, there is no standard way for webpages to declare tasks that AI assistants can perform on their content. This leads to an inconsistent and fragment…

flaboy updated 2 months ago
1
ioos/ioos-code-sprint #34

[Project Proposal]: ERDDAP web logs analysis

### Project Description Develop a tool that reads in the web logs of an ERDDAP server to analyse how the server is being used. This would include: - filtering out bots/crawlers/spam - analysing a…

callumrollo updated 1 month ago
25
mandatoryprogrammer/xsshunter-express #18

Support for non-standard ports

The default configuration use 80 and 443 for both the container and the host machine. Is it possible to run on a non-default port e.g. 1234 for HTTPS? We don't want Internet crawlers and malicious …

luchua-bc updated 2 months ago
1

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for web-crawlers

1000+ results
for web-crawlers