issues
search
commonsearch
/
cosr-back
Backend of Common Search. Analyses webpages and sends them to the index.
https://about.commonsearch.org
Apache License 2.0
123
stars
24
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Fix caching and hash collisions in _fast_make_domain_id
#74
sebastian-nagel
opened
7 years ago
0
deduplicated urls in backlink plugin results #65
#73
valentinlehuger
opened
7 years ago
2
Use only python 2.7 for virtualenv
#72
chaconnewu
closed
7 years ago
2
Getting error running 'make virtualenv'
#71
chaconnewu
closed
7 years ago
1
Add Commonsearch PageRank Signal
#70
HenriqueLimas
closed
8 years ago
5
Bug correction #62 adding option --overwrite for pagerank job.
#69
acassaigne
closed
8 years ago
2
Investigate annotating all the code with mypy
#68
sylvinus
opened
8 years ago
0
Fix `make import_local_data` ETA computation
#67
AloysAugustin
closed
8 years ago
2
Run the tests inside a Docker-compose environment
#66
MickaelBergem
closed
8 years ago
2
Deduplicate URLs in backlinks plugin results
#65
sylvinus
opened
8 years ago
0
Add docker-compose for the local tests
#64
sylvinus
closed
8 years ago
0
Integrate the new Common Crawl News dataset
#63
sylvinus
opened
8 years ago
1
PageRank & other jobs: check if output directory already exists
#62
sylvinus
opened
8 years ago
0
Added quotes around grep words, was causing error
#61
bakztfuture
closed
8 years ago
2
Add Makefile commands to save/load elasticsearch snapshots
#60
HenriqueLimas
opened
8 years ago
0
Advertising Lists
#59
indolering
closed
8 years ago
2
Add new Malware/Phishing Blacklists
#58
indolering
opened
8 years ago
1
Add Stack Overflow document source
#57
sylvinus
opened
8 years ago
1
Add a Reddit data source
#56
sylvinus
opened
8 years ago
0
Add a Github document source
#55
sylvinus
opened
8 years ago
2
Add GDELT document source
#54
sylvinus
opened
8 years ago
0
Investigate MyHTML parser
#53
sylvinus
opened
8 years ago
0
Improve host-level PageRanks
#52
sylvinus
opened
8 years ago
1
Briefly updated INSTALL.md docs
#51
bakztfuture
closed
8 years ago
2
Speed up Travis builds
#50
sylvinus
opened
8 years ago
1
Errors During Installation
#49
mechaman
opened
8 years ago
3
Move tree traversal to Cython
#48
sylvinus
closed
8 years ago
0
Refactor the indexing pipeline by adding support for plugins
#47
sylvinus
closed
8 years ago
0
Spark-submit uses only 1 core.
#46
IvRRimum
closed
8 years ago
4
Simplify and test url indexing
#45
Sentimentron
closed
8 years ago
2
Questions on deployment
#44
IvRRimum
closed
8 years ago
1
Structure of ES clusters
#43
IvRRimum
closed
8 years ago
9
Strip Unicode Emoji characters from page titles
#42
Sentimentron
closed
8 years ago
3
Too many open files in Explainer
#41
sylvinus
opened
8 years ago
0
Tokenizer improvements
#40
sylvinus
opened
8 years ago
3
Make coverage.py work with pyspark / spark-submit
#39
sylvinus
closed
8 years ago
0
Use json-ld for document description
#38
Tpt
opened
8 years ago
4
Error running index job
#37
chaconnewu
closed
8 years ago
5
Updated Common crawl to Feb 2016 crawl
#36
vanhalt
closed
8 years ago
1
Improve filtering of EU cookie notices
#35
sylvinus
opened
8 years ago
1
Index presence of ads, trackers
#34
mlinksva
opened
8 years ago
5
#29 URL.normalized: strip default ports
#33
hjacobs
closed
8 years ago
1
Add unit test and code coverage badges to README
#32
hjacobs
closed
8 years ago
2
Index the public suffix part of domains
#31
sylvinus
closed
8 years ago
0
Index license info
#30
sylvinus
opened
8 years ago
3
Simple improvements to URL normalization
#29
sylvinus
opened
8 years ago
0
Add first document-level quality signals
#28
sylvinus
opened
8 years ago
3
Create documents from DMOZ/Wikidata when they are missing in CC
#27
sylvinus
closed
8 years ago
1
Improve static datasources import & storage
#26
sylvinus
closed
8 years ago
0
Improve datasources import
#25
sylvinus
opened
8 years ago
0
Next