-
Currently it is only possible to start a crawl with the enter key after typing the url.
I would suggest a "start crawl" button beside the url input field.
this button can change during crawling to…
-
Pyspider meets ten of thousands of CLOSE_WAIT connections in my machine and leads to a lot of timeout failed tasks.
![image](https://cloud.githubusercontent.com/assets/1130243/15662808/c84be56c-2726-…
-
I was wondering whether we should have some kind of performance-based tests as part of the CI/CD pipelines.
Any thoughts?
-
Add a web crawler to the project to get data from different news feeds and store it in the database.
Use python and SQLite database.
List of RSS URLs stored at the `crowler/urls.txt` file, the…
-
## Existing problems
#### 1: biologist crawler
![image](https://user-images.githubusercontent.com/12608778/185968113-c2bba37c-526a-4639-a5f3-632a5a7ff5e4.png)
Currently biologist have 2 mai…
-
**World Ferns of 2020-09-25**
see https://github.com/CatalogueOfLife/testing/issues/22
**World Ferns of 2020-12-08.**
Metadata patched:
![image](https://user-images.githubusercontent.com/…
-
- Website URL: https://libretexts.org/platforms/libraries/
- License: **Creative Commons**
- Desired ZIM Title: **Libretexts XX Bookshelf** (see list
- Desired ZIM Description: **Textbooks curat…
-
I’m encountering an issue when running a large number of tasks. The scraper sometimes gets stuck on a specifc task without displaying any apparent error message. Occasionally, I get a "server is down"…
-
In the Apache Arrow C++ project, we have been working the last moths on a Dataset API (original design document: https://docs.google.com/document/d/1bVhzifD38qDypnSjtf8exvpP3sSB5x_Kw9m-n66FB2c/edit). …
-
Has this project been abandoned?
It looks very promising, other than the (_apparent_) lack of progress recently.