issues
search
codelucas
/
newspaper
newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
https://goo.gl/VX41yK
MIT License
13.88k
stars
2.1k
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
There seem to be complaints related to the user agent scraping permission issue
#1002
sutgeorge
closed
1 week ago
3
三国历史
#1001
95wandao
opened
3 months ago
0
Questions about Copilot + Open Source Software Hierarchy
#997
liaochris
closed
2 months ago
0
Added bn (Bengali) stopwords
#996
moyshik7
closed
3 months ago
2
added new language 'Tamil' (non latin)
#979
pj8912
closed
3 months ago
3
TIPS FOR IMPROVEMENT
#978
aleksandar-devedzic
opened
7 months ago
5
Can I get the CSS Selectors ?
#977
izJoey
closed
7 months ago
2
gnews with user agent returns empty text
#976
wj210
opened
8 months ago
1
Allow newspaper work on news websites like carbon-pulse
#975
Cabu
opened
9 months ago
0
update nltk 3.2.1 to 3.6.6
#974
julianofischer
closed
9 months ago
0
Getting Older News Articles
#973
PaulKMandal
opened
9 months ago
4
Consider switching from lxml's clean_html for enhanced security (and possibly performance)
#972
frenzymadness
opened
10 months ago
1
Parsing Problem
#971
skittoo
closed
1 year ago
6
Project status
#970
cgreening
opened
1 year ago
4
TIPS FOR FAST IMPROVEMENT
#969
aleksandar-devedzic
opened
1 year ago
0
not working for gnews.org
#968
Jooey233
opened
1 year ago
0
Date extraction is faulty
#967
inspectorG4dget
opened
1 year ago
2
tamil language support
#966
sam9111
closed
1 year ago
0
suggestions for calculate_best_node
#965
shewenkan
opened
1 year ago
0
download() halts/stuck forever with a specific URL
#964
KeremTurgutlu
opened
1 year ago
0
download() halts/stuck forever with a specific URL
#963
KeremTurgutlu
closed
1 year ago
0
Is there any optimization in the configuration of newspaper?
#962
liurich12138
closed
1 year ago
0
the API doesn't work
#961
androidAppMe
opened
1 year ago
10
can't start new thread
#960
liuying12138
opened
1 year ago
2
Where to find and delete all articles?
#959
steeljardas
opened
1 year ago
7
Article() not returning anything when fed article links
#958
SohailSayed
closed
1 year ago
1
fix(sec): upgrade nltk to 3.6.6
#957
chncaption
opened
1 year ago
0
fix(sec): upgrade requests to 2.20
#956
chncaption
opened
1 year ago
0
Would not load custom feed articles
#955
Coinjuice
opened
1 year ago
0
Project dependencies may have API risk issues
#954
PyDeps
closed
1 year ago
4
fix itemprop containing articleBody
#953
AndyTheFactory
opened
1 year ago
0
ContentExtractor.nodes_to_check doesn't recognize the "right" <p> elements in html article
#952
tomer2406
opened
1 year ago
0
bengali support added
#951
ffaisal93
opened
1 year ago
0
Some article texts are not fully downloaded.
#950
Jimchoo91
opened
1 year ago
2
Delete downloaded articles
#949
adildg
opened
1 year ago
0
I just want to help with date extraction
#948
aleksandar-devedzic
opened
1 year ago
1
The content extracted by newspape is out of order
#947
riusksk
opened
1 year ago
0
Do not work with https://cryptopotato.com/
#946
saileshkush95
closed
1 year ago
0
http://www.dw.com,I coudn‘t get urls.
#945
huangsiyuan924
opened
1 year ago
8
How do I set the cache directory to the current project root path
#944
huangsiyuan924
opened
1 year ago
1
not getting all article links after refreshing again the code?
#943
Aliktk
closed
1 year ago
3
Replace os.listdir by os.scandir in utils.get_available_languages
#942
tgrandje
closed
1 year ago
1
Added ability to scrape javascript intensive apps
#941
Sosshi
opened
2 years ago
1
I get back the same description on all these links, although they are clearly different.
#940
gaurav-95
opened
2 years ago
0
It can't work with BBC
#939
Qggg
closed
1 year ago
2
Close opened category cache files when done
#938
WillGITCode
opened
2 years ago
0
It turns out that a lot of sites do not work with
#937
alekssamos
opened
2 years ago
2
Unable to pull articles from list of article URL's
#936
Unique201
opened
2 years ago
1
Authors and date are not correctly identified in wordpress website
#935
alekssamos
opened
2 years ago
2
Change general exceptions in Configuration
#934
nnick14
opened
2 years ago
0
Next