arshaw scrapemark issues - Githubissues

arshaw / scrapemark

Super-convenient web scraping in Python

96 stars 28 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

unicode support?

#19 bookie988 opened 12 years ago
0
Problem with nested loop

#18 phoebebright opened 12 years ago
0
Attribute value ignored when capturing another attribute value in the same tag

#17 ackalker opened 12 years ago
1
Incorrect reading for "Ø"

#16 zalun opened 12 years ago
0
Pull Request

#15 quink closed 1 year ago
3
Problem with <a HREF>

#14 phzbox opened 13 years ago
2
Recursive subpatterns

#13 timClicks closed 13 years ago
2
Optional matching

#12 timClicks closed 13 years ago
2
ValueError in _substitute_entity() substituting '#x201C' like strings

#11 arshaw opened 13 years ago
1
Support other content-encodings, other then utf8 (support Swedish characters)

#10 arshaw opened 13 years ago
0
Scrapemark fails to decode hex-encoded HTML entities

#9 arshaw opened 13 years ago
0
Exception while parsing things like '<a href="">Some text</a>'

#8 arshaw opened 13 years ago
1
Multibyte non-utf-8 encoded pages are decoded incorrectly

#7 arshaw opened 13 years ago
1
Nested loops are broken in scrapemark 0.9

#6 arshaw opened 13 years ago
1
Scrapemark sometimes gets confused w/ non-closing HTML tags

#5 arshaw opened 13 years ago
1
syntax for concatenating captures

#4 arshaw opened 13 years ago
1
scraping of comments

#3 arshaw opened 13 years ago
0
Mixed case unable to scrape

#2 arshaw opened 13 years ago
1
custom filters

#1 arshaw opened 13 years ago
2