-
The goal of this feature is to allow users to archive manually using the browser within Browsertrix Cloud, not unlike ArchiveWeb.page extension and the classic Conifer workflow. This feature involves …
-
I have looked at #293 and #289, but those issues are slightly different. We have a crawler library based on `node-crawler` that performs computationally intensive crawling tasks and writes to differen…
-
# Problem
We currently have [one page that uses JavaScript](http://everypolitician.org/needed) to progressively enhance the page by displaying up-to-date data from a remote API. It first gets the d…
-
**Project info**
Title: Attachment parsing and indexing
Goals: Make private full-text search system for big amount of emails and attachments
Priority: high but not critical
**Description…
-
### Describe the user story:
To enable Wikonnect to captivate a greater audience, existing & new content should be easy to embed on third-party sites. Additionally, Wikonnect should be discoverable a…
-
## Open Questions
### Modelling/Dimensionality Reduction
- [ ] Decide how will we want to deal with 3D data (vectorized, 2D, or 3D matrices)
- [ ] Determine network capacity (N layers; size; ty…
gkiar updated
5 years ago
-
## Summary
Might be split into two separate tasks that have the same goal though: By default, do not log any sensitive information like PCBIDs to the console/text files.
## Detailed description
T…
icex2 updated
3 years ago
-
### Parent Issue
_No response_
### Problem Statement
Currently, the cli's Workspace manager tries to discover a `.dot-workspace.yml` marker
if it does not find one it starts "crawling up" t…
-
In `scheduler._check_select`, call `self.taskdb.get_task` . This behave slow down pyspider.
Why not just save task instead of taskid?
```
def _check_select(self):
#.....
taskids…
-
I tried to add:
```
response = yield from asyncio.wait_for(
self.session.get(url, allow_redirects=False), 20)
```
instead of
```
response = yield from self.…