-
This is a follow-up of #112
As per #112 we fetch community info from community's Solid Pod in order to display it on the website.
We also update the community info during build, to ensure corre…
-
Some sites (for example Facebook's crawler than reveals social cards) require sites to serve the full chain SSL certificate, not just the site's certificate. However sites I host on Virtualmin that ar…
-
Websites like Mastodon, allow users to "verify" their profiles on other websites.
For example, adding `Mastodon` to a website will let a crawler know that the the user `username` on the service `ma…
edent updated
11 months ago
-
-
### Pitch
In August, GPTbot block was merged into the code https://github.com/mastodon/mastodon/pull/26396. Now, Google has a robots.txt policy for Bard and future Google AI models with user agent Go…
-
Hi, Thanks for this great work.
I have been playing around with this, to crawl webpages and get content in markdown format, which can be used to provide to LLMs for grounding. But when I used them …
-
待爬頁面: http://web.hsinchu.gov.tw/social/home.jsp?contlink=content/20080729203315.jsp&mserno=200806250001&serno=200807290002&menudata=socialmenu
crawler 位置(以 ruby 為例,可自己選擇語言,但請於檔案開頭說明如何執行)
`crawlers/HS…
-
# Feed crawler
Feed crawler – service which posts the best (under multiple criteria) news from media services and social networks.
**Problem**: There is too much information on the Internet. You…
n0str updated
4 years ago
-
### Are you submitting a **bug report** or a **feature request**?
Bug report
### What is the current behavior?
I get an error when I want to start a crawl. This is the error
```node
Run…
-
It should be up to crawler developer whether he want's to parallelize the process, or not.
For example for some social networks it is not good to scrap with many parallel threads.
For we need to c…