issues
search
codelucas
/
newspaper
newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
https://goo.gl/VX41yK
MIT License
14.09k
stars
2.11k
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Update download_corpora.py - add 'punkt_tab'
#1006
Ronkiro
opened
1 week ago
0
Not working at all on Hindi language Newspaper
#1005
AnupamSingh-dataanalyst
opened
2 weeks ago
0
bump ntlk to 3.8.1 to prevent unsafe deserialization vulnerability #32
#1004
paualarco
closed
2 months ago
0
Downloaded article from Google RSS News only return Google Images
#1003
andisoer
opened
2 months ago
2
There seem to be complaints related to the user agent scraping permission issue
#1002
sutgeorge
closed
3 months ago
3
三国历史
#1001
95wandao
opened
6 months ago
0
Questions about Copilot + Open Source Software Hierarchy
#997
liaochris
closed
5 months ago
0
Added bn (Bengali) stopwords
#996
moyshik7
closed
6 months ago
2
added new language 'Tamil' (non latin)
#979
pj8912
closed
6 months ago
3
TIPS FOR IMPROVEMENT
#978
aleksandar-devedzic
opened
10 months ago
5
Can I get the CSS Selectors ?
#977
izJoey
closed
11 months ago
2
gnews with user agent returns empty text
#976
wj210
opened
11 months ago
1
Allow newspaper work on news websites like carbon-pulse
#975
Cabu
opened
1 year ago
0
update nltk 3.2.1 to 3.6.6
#974
julianofischer
closed
1 year ago
0
Getting Older News Articles
#973
PaulKMandal
opened
1 year ago
4
Consider switching from lxml's clean_html for enhanced security (and possibly performance)
#972
frenzymadness
opened
1 year ago
2
Parsing Problem
#971
skittoo
closed
1 year ago
6
Project status
#970
cgreening
opened
1 year ago
4
TIPS FOR FAST IMPROVEMENT
#969
aleksandar-devedzic
opened
1 year ago
0
not working for gnews.org
#968
Jooey233
opened
1 year ago
0
Date extraction is faulty
#967
inspectorG4dget
opened
1 year ago
2
tamil language support
#966
sam9111
closed
1 year ago
0
suggestions for calculate_best_node
#965
shewenkan
opened
1 year ago
1
download() halts/stuck forever with a specific URL
#964
KeremTurgutlu
opened
1 year ago
0
download() halts/stuck forever with a specific URL
#963
KeremTurgutlu
closed
1 year ago
0
Is there any optimization in the configuration of newspaper?
#962
liurich12138
closed
1 year ago
0
the API doesn't work
#961
androidAppMe
opened
1 year ago
10
can't start new thread
#960
liuying12138
opened
1 year ago
2
Where to find and delete all articles?
#959
steeljardas
opened
1 year ago
7
Article() not returning anything when fed article links
#958
SohailSayed
closed
1 year ago
1
fix(sec): upgrade nltk to 3.6.6
#957
chncaption
opened
1 year ago
0
fix(sec): upgrade requests to 2.20
#956
chncaption
opened
1 year ago
0
Would not load custom feed articles
#955
Coinjuice
opened
1 year ago
0
Project dependencies may have API risk issues
#954
PyDeps
closed
1 year ago
4
fix itemprop containing articleBody
#953
AndyTheFactory
opened
2 years ago
0
ContentExtractor.nodes_to_check doesn't recognize the "right" <p> elements in html article
#952
tomer2406
opened
2 years ago
0
bengali support added
#951
ffaisal93
opened
2 years ago
0
Some article texts are not fully downloaded.
#950
Jimchoo91
opened
2 years ago
2
Delete downloaded articles
#949
adildg
opened
2 years ago
0
I just want to help with date extraction
#948
aleksandar-devedzic
opened
2 years ago
1
The content extracted by newspape is out of order
#947
riusksk
opened
2 years ago
0
Do not work with https://cryptopotato.com/
#946
saileshkush95
closed
2 years ago
0
http://www.dw.com,I coudn‘t get urls.
#945
huangsiyuan924
opened
2 years ago
8
How do I set the cache directory to the current project root path
#944
huangsiyuan924
opened
2 years ago
1
not getting all article links after refreshing again the code?
#943
Aliktk
closed
2 years ago
3
Replace os.listdir by os.scandir in utils.get_available_languages
#942
tgrandje
closed
2 years ago
1
Added ability to scrape javascript intensive apps
#941
Sosshi
opened
2 years ago
1
I get back the same description on all these links, although they are clearly different.
#940
gaurav-95
opened
2 years ago
0
It can't work with BBC
#939
Qggg
closed
2 years ago
2
Close opened category cache files when done
#938
WillGITCode
opened
2 years ago
0
Next