issues
search
GateNLP
/
ultimate-sitemap-parser
Ultimate Website Sitemap Parser
https://mediacloud.org/
Other
181
stars
64
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Merge changes from pull requests
#43
dttad
closed
2 months ago
0
Roadmap
#42
freddyheppell
opened
2 months ago
0
End of Support
#41
pgulley
closed
2 months ago
6
Put urls in lowercase and as a result, the urls are no longer valid
#40
Abdoulkadir-ali
opened
1 year ago
3
Don't include invalid sitemaps in trees
#39
gbenson
opened
1 year ago
0
Unable to gunzip response
#38
Leezj9671
closed
1 year ago
1
add optional argument to requests web client, to ignore SSL checking
#37
japherwocky
opened
1 year ago
0
Add Anaconda details to README
#36
freddyheppell
opened
2 years ago
0
Fix incorrect lowercasing of robots.txt Sitemap URLs
#35
ArthurMelin
opened
2 years ago
0
Authentication Method for Secured Sites?
#34
ma26yank
opened
2 years ago
0
SSL Certificate error fix?
#33
ma26yank
opened
2 years ago
1
Exculde specific sitemap from sitemap_tree_for_homepage
#32
Bishwas-py
closed
1 week ago
1
Invalid time stamp cannot be handled
#31
ebauch
opened
2 years ago
0
How to set timeout properly
#30
ebauch
closed
2 months ago
2
RecursionError - maximum recursion depth exceeded while calling a Python object
#29
caligoig
opened
2 years ago
0
ModuleNotFoundError: No module named 'http.client'; 'http' is not a package
#28
moehmeni
closed
3 years ago
1
Resolving issue #22
#27
gavishpoddar
closed
3 years ago
0
Provide a simple mechanism to parse raw sitemap content
#26
dsoprea
opened
3 years ago
0
Disable logging?
#25
Pikamander2
opened
3 years ago
5
log.py: Eliminate log configuration
#24
dsoprea
closed
3 years ago
0
Library interferes with application logging configuration
#23
dsoprea
opened
4 years ago
10
Throwing Exception while parsing date
#22
malhotraguy
opened
4 years ago
1
Can't use own logging handlers / Can't propagate logger
#21
marcinhlybin
opened
4 years ago
1
Update requests_client.py
#20
tgrandje
closed
4 years ago
2
Not able to parse the html sitemaps
#19
malhotraguy
closed
5 years ago
2
Convert SitemapNewsStory to dict
#18
J535D165
opened
5 years ago
0
Support to parse robot.txt which has crawl delay
#17
jayanthchandra
closed
4 years ago
1
Exception Handling for requests module.
#16
jayanthchandra
closed
5 years ago
1
Optional typing should be set to None for considering as Optional Argument
#15
jayanthchandra
closed
5 years ago
1
Error in request causes total crash
#14
bartmachielsen
closed
5 years ago
3
Don't refetch sitemaps that were already fetched
#13
pypt
opened
5 years ago
0
Some sitemaps don't get fetched fully
#12
pypt
opened
5 years ago
0
Not working
#11
JonhSilver
closed
5 years ago
8
BOM removal doesn't seem to work properly
#10
pypt
closed
5 years ago
1
This site is not working => "set()" as result
#9
chatelao
closed
5 years ago
7
Detection of sitemap if it's not present in robots.txt
#8
kienli
closed
5 years ago
12
Need Contribution and Set up Guidelines to facilitate in Development.
#7
jayanthchandra
closed
5 years ago
1
Prevent XML parser from parsing gzipped XMLs that it's unable to decompress
#6
pypt
closed
5 years ago
2
Reduce recursivity level for sitemap fetcher
#5
pypt
opened
6 years ago
1
If `Content-Type` header is set, verify it's the expected one
#4
pypt
opened
6 years ago
1
Add support for RSS / Atom sitemaps
#3
pypt
closed
5 years ago
0
`yield` found links instead of `return`ing them
#2
pypt
closed
5 years ago
1
Add support for Crawl-Delay from robots.txt
#1
pypt
opened
6 years ago
2