issues
search
Data4Democracy
/
internal-displacement
Studying news events and internal displacement.
43
stars
27
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
scraper
#54
jlln
closed
7 years ago
0
simple enhancements to the country code extraction function
#53
simonb83
closed
7 years ago
0
initial unit tests for Interpreter and Scraper
#52
simonb83
closed
7 years ago
1
Enhance country detection in article content
#51
simonb83
closed
7 years ago
6
Fix exception handling in pdf_parsing
#50
simonb83
closed
7 years ago
0
sql to csv function
#49
georgerichardson
closed
7 years ago
0
Gr/debug1
#48
georgerichardson
closed
7 years ago
0
Issue 41/pdf published date
#47
simonb83
closed
7 years ago
0
Pdfs/43
#46
georgerichardson
closed
7 years ago
0
Add functions to test if url is pdf by looking at url...
#45
simonb83
closed
7 years ago
0
Extract country code
#44
simonb83
closed
7 years ago
1
Manage PDF scraping
#43
georgerichardson
closed
7 years ago
8
Detect URLs with PDF
#42
georgerichardson
closed
7 years ago
2
Extract document details from PDF
#41
georgerichardson
opened
7 years ago
10
Filter enhancements
#40
simonb83
closed
7 years ago
0
Incorporate Scrape pdf/17
#39
georgerichardson
closed
7 years ago
0
experimental notebook for spacy parsing of article titles
#38
wwymak
closed
7 years ago
0
further nlp exploration for methods to identify relevant articles
#37
simonb83
closed
7 years ago
0
implement Filter with method for setting language property of articles
#36
simonb83
closed
7 years ago
1
move files from internal-d to internal_d and delete old directory
#35
simonb83
closed
7 years ago
0
Jlln pipeline
#34
jlln
closed
7 years ago
1
refactor scraper.py in internal-displacement
#33
simonb83
closed
7 years ago
1
some initial nlp exploration with spacy
#32
simonb83
closed
7 years ago
0
Scraper - Detect and tag language
#31
georgerichardson
closed
7 years ago
4
appropriately tag broken urls that cannot be downloaded by newspaper
#30
simonb83
closed
7 years ago
0
Revert "deal with case where url doesn't exist"
#29
georgerichardson
closed
7 years ago
0
SQL Interface
#28
jlln
closed
7 years ago
0
Adding pdf parsing
#27
coldfashioned
closed
7 years ago
2
Scraper - Tag content type
#26
georgerichardson
opened
7 years ago
4
Scraper - Asynchronous tasks for scraper.py
#25
georgerichardson
closed
7 years ago
1
Scraper - Refactor old scraper
#24
georgerichardson
closed
7 years ago
0
Merge pull request #1 from Data4Democracy/scraper
#23
jlln
closed
7 years ago
1
deal with case where url doesn't exist
#22
simonb83
closed
7 years ago
4
add Data Engineering section to workplan
#21
simonb83
closed
7 years ago
0
Pipeline - consistent date and time
#20
georgerichardson
closed
7 years ago
6
Pipeline - save data to csv
#19
georgerichardson
closed
7 years ago
5
Explore refugee data in Jupyter Notebooks
#18
georgerichardson
closed
7 years ago
4
Scraper - Parsing PDFs?
#17
georgerichardson
closed
7 years ago
9
Create, maintain and update user guide / admin guide.
#16
simonb83
closed
7 years ago
0
Build / train classifier for article classification
#15
simonb83
closed
7 years ago
2
Best Machine Learning approach for classifying documents and articles
#14
simonb83
closed
7 years ago
13
Implement filtering of documents not reporting on human mobility
#13
simonb83
closed
7 years ago
0
Update workplan.md with additional detail from Leonardo Milano + sect…
#12
simonb83
closed
7 years ago
0
Issues structure
#11
simonb83
closed
7 years ago
0
Train classifier on training dataset - Utilities for training classifiers
#10
jlln
closed
7 years ago
2
Fill out sample_urls function for returning a subsample of urls.
#9
simonb83
closed
7 years ago
0
Get random subsample of URLs from list
#8
georgerichardson
closed
7 years ago
5
Train classifier on training dataset
#7
georgerichardson
closed
7 years ago
3
Visualization discussion
#6
georgerichardson
closed
7 years ago
6
Improve text extraction from URLs with beautifulsoup
#5
georgerichardson
closed
7 years ago
13
Previous
Next