issues
search
advancedlogic
/
GoOse
Html Content / Article Extractor in Golang
Apache License 2.0
436
stars
111
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Bump golang.org/x/net from 0.17.0 to 0.23.0
#76
dependabot[bot]
opened
6 months ago
0
Bump golang.org/x/net from 0.7.0 to 0.17.0
#75
dependabot[bot]
closed
11 months ago
0
Bump golang.org/x/net from 0.0.0-20190628185345-da137c7871d7 to 0.7.0
#74
dependabot[bot]
closed
1 year ago
0
Bump golang.org/x/text from 0.3.0 to 0.3.8
#73
dependabot[bot]
opened
1 year ago
0
Who to contact for security issues
#72
JamieSlome
opened
2 years ago
0
Fixed cleaner for eff.org
#71
wkornewald
closed
1 year ago
0
Fixed cleaner for theguardian.com & newyorker.com
#70
wkornewald
closed
3 years ago
0
Fails to handle The Guardian and The New Yorker
#69
wkornewald
closed
3 years ago
0
Use correct Content-Type header to fetch HTML
#68
shaneiseminger
closed
3 years ago
0
Wrong Content-Type in request
#67
shaneiseminger
closed
4 years ago
2
Does not work on medium articles
#66
sathishvj
closed
4 years ago
1
Prevent from extracting the unmodified title twice
#65
calou
closed
4 years ago
0
Added Indonesia stopwords lists
#64
naupaw
closed
3 years ago
0
Update go.mod
#63
cutd
closed
5 years ago
1
New feature h1
#62
muhammet-mucahit
opened
5 years ago
0
Added Turkish Stop Words
#61
muhammet-mucahit
opened
5 years ago
0
Error when doing GO GET
#60
YSZhuoyang
closed
5 years ago
0
Seperate crawler and request html extractor.
#59
Merlinvt
closed
5 years ago
0
Fix function comments based on best practices from Effective Go
#58
Daanikus
closed
5 years ago
1
Updates signature and typing for fatih/set
#57
obrodinho
closed
6 years ago
2
Installation error due to dependency "https://github.com/fatih/set"
#56
brunover
closed
5 years ago
0
article.CleanedText is empty if the url has query
#55
josemojena
opened
6 years ago
0
Various refactorings
#54
rahal
closed
6 years ago
0
Crawl error handling
#53
jaytaylor
closed
6 years ago
0
First pass at extraction of article date published field.
#52
jaytaylor
closed
6 years ago
0
PublishDate Not Working
#51
SamuelBWasserman
closed
6 years ago
1
FinalURL not working for redirecting URLs?
#50
philmcp
opened
6 years ago
0
Detect charset when Content-Type contains text/xhtml
#49
theSoenke
closed
7 years ago
0
Switching back to upstream goquery
#48
muesli
opened
7 years ago
0
no problem
#47
hebijiandai
closed
7 years ago
0
use log.Println instead of fmt.Println
#46
syou6162
closed
7 years ago
0
Update README.md
#45
LeMoussel
closed
6 years ago
0
General improvements and fixes
#44
nicolaasuni
closed
7 years ago
0
add Article.TitleUnmodified field & test
#43
Profpatsch
closed
5 years ago
1
Add initialization of go-charset
#42
bobuhiro11
opened
7 years ago
0
"github.com/advancedlogic/gojs-config" is no longer available
#41
truongsinh
closed
4 years ago
5
Consider adding crawler creation directly from *goquery.Document
#40
let4be
opened
8 years ago
1
Change crawl to return error instead of panicking.
#39
ejamesc
closed
8 years ago
0
test fix and CI integration
#38
nicolaasuni
closed
8 years ago
0
general improvements
#37
nicolaasuni
closed
8 years ago
0
match exact meta tag when crawling
#36
truongsinh
closed
9 years ago
0
Added basic support for extracting published_time meta value
#35
dhowden
opened
9 years ago
0
GoOse imports an old forked version of goquery, which doesn't seem like its needed anymore
#34
urandom
closed
8 years ago
0
Fixed: Rewritten code.google.com import to github version
#33
dhowden
closed
9 years ago
1
cannot extract from raw html
#32
mevartma
opened
9 years ago
0
Extract hidden text from NY Times
#31
jice-lavocat
opened
9 years ago
0
fix image URL which is relative and/or has special chars
#30
truongsinh
closed
9 years ago
0
code.google.com/p/go-charset/charset no longer exists
#29
mish15
closed
9 years ago
1
go get github.com/advancedlogic/GoOse fails because the cascadia package has moved
#28
elg0nz
closed
8 years ago
0
resolve Facebook photo
#27
truongsinh
closed
9 years ago
0
Next