-
We are experiencing extremely slow task submission via the DaskExecutor for very large mapped tasks. With previous flow tests where a task was mapped over roughly 20K items, task submission was suffi…
-
As a developer I would like to be able to use all available tools, which requires them to be available in our database.
This could be done through a script or by building an RPA bot for this task
…
-
The crawling of "arts and humanities", "Bamboo dirt", "History Online" etc is done via a Python script at the moment.
A better solution (to minimize the dependencies) would be to port this code to PHP…
-
## User story
As a user I would like to be able to scan sites which are heavily based on JavaScript.
## Research
- [ ] How does [arachni implement JS crawling](https://github.com/Arachni/ara…
-
Tasks:
* [ ] Research/enable Donate button on GitHub. (See screenshot below for knob to twiddle.)
* [ ] Create personal GitHub Sponsor page for myself
* [ ] Verify payment processing works …
-
I was hoping someone could share some example setup of how to use animancer, their state machine with a multiplayer solutions.
I am mostly trying to wrap my head around when the state knows how to …
-
I'm trying to crawl the website by using the feature in the app, but it kept stopping even the max links is set to over 100. I've even deleted and reset the project, but kept stopping in a random task…
-
The existing journal workflows need to be ported to use the new `hepcrawl` service based on `scrapy`. Scheduled and one-shot harvests can be made by triggering harvests via appropriate Celery tasks wh…
-
I have looked at #293 and #289, but those issues are slightly different. We have a crawler library based on `node-crawler` that performs computationally intensive crawling tasks and writes to differen…
-
magenbluten writes in comments to #15
> as i already mentioned "rm -Rf" doesn't work on windows systems. aditionally the "FSUtil.systemLines" uses "someprog > somefile" for capturing and parsing prog…