Closed S52968 closed 7 hours ago
The regex won't match since there's another nested layer of <div>
, not sure if it's added after exzhawk made the fix.
Here's my fix, simply changing PATTERN_LARGE_PREVIEW_NEW
would already fix the problem, but I didn't really like the idea of parsing the entire page of HTML with regex, so I tried to limit the amount of HTML code being parsed.
- app/src/main/java/com/hippo/ehviewer/client/parser/GalleryDetailParser.java -
index dfdb862a..62b22667 100644
@@ -70,7 +70,8 @@ public class GalleryDetailParser {
private static final Pattern PATTERN_NORMAL_PREVIEW = Pattern.compile("<div class=\"gdtm\"[^<>]*><div[^<>]*width:(\\d+)[^<>]*height:(\\d+)[^<>]*\\((.+?)\\)[^<>]*-(\\d+)px[^<>]*><a[^<>]*href=\"(.+?)\"[^<>]*><img alt=\"([\\d,]+)\"");
private static final Pattern PATTERN_NORMAL_PREVIEW_NEW=Pattern.compile("<a[^<>]*href=\"([^\"]+)\"><div title=\"Page (\\d+): [^\"]+\" style=\"width:(\\d+)[^<>]*height:(\\d+)[^<>]*\\((.+?)\\)[^<>]*-(\\d+)px");
private static final Pattern PATTERN_LARGE_PREVIEW = Pattern.compile("<div class=\"gdtl\".+?<a href=\"(.+?)\"><img alt=\"([\\d,]+)\".+?src=\"(.+?)\"");
- private static final Pattern PATTERN_LARGE_PREVIEW_NEW = Pattern.compile("<a[^<>]*href=\"([^\"]+)\"><div title=\"Page (\\d+): [^\"]+\"[^<>]*\\((.+?)\\)[^<>]*\">");
+ private static final Pattern PATTERN_LARGE_PREVIEW_NEW = Pattern.compile("f=\"([^\"]+)\"[^\"]+\"Page (\\d+)[^(]+\\(([^)]+)");
+ private static final Pattern PATTERN_PREVIEW_RN = Pattern.compile("(\\d+)[^(]+\\(([^)]+)");
private static final GalleryTagGroup[] EMPTY_GALLERY_TAG_GROUP_ARRAY = new GalleryTagGroup[0];
private static final GalleryCommentList EMPTY_GALLERY_COMMENT_ARRAY = new GalleryCommentList(new GalleryComment[0], false);
@@ -601,7 +602,8 @@ public class GalleryDetailParser {
public static PreviewSet parsePreviewSet(String body) throws ParseException {
try {
- return parseLargePreviewSet(body);
+ return parseLargePreviewSet(Jsoup.parse(body), body);
} catch (ParseException e) {
try {
return parseNormalPreviewSet(body);
@@ -621,7 +623,22 @@ public class GalleryDetailParser {
Elements gdtls = gdt.getElementsByClass("gdtl");
int n = gdtls.size();
if (n <= 0) {
- throw new ParseException("Can't parse large preview", body);
+ if (gdt.childNodeSize() <= 0) {
+ throw new ParseException("Can't parse large preview", body);
+ }
+ for (Node previewNode : gdt.childNodes()) {
+ String pageUrl = previewNode.attr("href");
+ String imageNode = previewNode.childNode(0).toString();
+ Matcher m = PATTERN_PREVIEW_RN.matcher(imageNode);
+ while (m.find()){
+ int index = Integer.parseInt(m.group(1));
+ String imageUrl = m.group(2);
+ largePreviewSet.addItem(index - 1, imageUrl, pageUrl);
+ }
+ }
+ return largePreviewSet;
}
for (int i = 0; i < n; i++) {
Element element = gdtls.get(i).child(0);
Please provide your "Thumbnail Settings" in Settings and specify which site you are using. I can't reproduce it with my account. please provide the preview PART of the HTML. not the full HTML in case of sensitive information. something like this:
<div id="gdt" class="gt200">
<a href="https://exhentai.org/s/96d06243c5/3102481-1">
<div title="Page 1: 01.jpg" style="width:200px;height:283px;background:transparent url(https://s.exhentai.org/t/96/d0/96d06243c5ff903962b980be4a893149e80db3dd-448601-1447-2048-jpg_l.jpg) 0 0 no-repeat"></div>
</a>
<a href="https://exhentai.org/s/bd962d6687/3102481-2">
<div title="Page 2: 02.jpg" style="width:200px;height:283px;background:transparent url(https://s.exhentai.org/t/bd/96/bd962d66878d1215fe5441085e63998f287de489-317032-1447-2048-jpg_l.jpg) 0 0 no-repeat"></div>
</a>
<a href="https://exhentai.org/s/64f4dacc95/3102481-3">
<div title="Page 3: 03.jpg" style="width:200px;height:283px;background:transparent url(https://s.exhentai.org/t/64/f4/64f4dacc95f6a13a55eca635772bf0c8ce360252-437103-1447-2048-jpg_l.jpg) 0 0 no-repeat"></div>
</a>
<a href="https://exhentai.org/s/1930398007/3102481-4">
<div title="Page 4: 04.jpg" style="width:200px;height:283px;background:transparent url(https://s.exhentai.org/t/19/30/193039800725b7236e80fc1b8fa2cb51db83bf0b-342115-1447-2048-jpg_l.jpg) 0 0 no-repeat"></div>
</a>
<a href="https://exhentai.org/s/8805cfb0ba/3102481-5">
<div title="Page 5: 05.jpg" style="width:200px;height:283px;background:transparent url(https://s.exhentai.org/t/88/05/8805cfb0ba0ee48a448b6986eede8706d82cd812-425874-1447-2048-jpg_l.jpg) 0 0 no-repeat"></div>
</a>
<a href="https://exhentai.org/s/0849fe6d2f/3102481-6">
<div title="Page 6: 06.jpg" style="width:200px;height:283px;background:transparent url(https://s.exhentai.org/t/08/49/0849fe6d2f5f2d6fdc3fcaac187ef16b79db3583-290782-1447-2048-jpg_l.jpg) 0 0 no-repeat"></div>
</a>
</div>
@exzhawk I think I've found the cause: there is a setting called "Gallery Page Thumbnail Labeling" which defaults to "Page Number Only" causing the DOM to have a nested <div>
like this:
<a href="https://exhentai.org/s/96d06243c5/3102481-1">
<div>
<div title="Page 1: 01.jpg" style="width:200px;height:283px;background:transparent url(https://s.exhentai.org/t/96/d0/96d06243c5ff903962b980be4a893149e80db3dd-448601-1447-2048-jpg_l.jpg) 0 0 no-repeat"></div>
<div>
Page 1
</div>
</div></a>
The code I posted in the previous comment (edited just now, removed one .childNode(0)
) would work for both cases, feel free to use it if you like. Also, please take a look at issue #30 and #31 if you have time.
@exzhawk I think I've found the cause: there is a setting called "Gallery Page Thumbnail Labeling" which defaults to "Page Number Only" causing the DOM to have a nested
<div>
like this:<a href="https://exhentai.org/s/96d06243c5/3102481-1"> <div> <div title="Page 1: 01.jpg" style="width:200px;height:283px;background:transparent url(https://s.exhentai.org/t/96/d0/96d06243c5ff903962b980be4a893149e80db3dd-448601-1447-2048-jpg_l.jpg) 0 0 no-repeat"></div> <div> Page 1 </div> </div></a>
The code I posted in the previous comment (edited just now, removed one
.childNode(0)
) would work for both cases, feel free to use it if you like. Also, please take a look at issue #30 and #31 if you have time.
It do works on my clients when change Gallery Page Thumbnail Labeling
in Ehwebsite Setting!
Thus it is the key to resolve this issue and my issues #35
I know other people have already answered, and a solution has been found, but just in case:
Here's what my HTML looks like:
<div id="gdt" class="gt100">
<a href="https://exhentai.org/s/f9f96c2677/3104279-2">
<div>
<div title="Page 2: P_002.png" style="width:100px;height:141px;background:transparent url(https://zurswtyclg.hath.network/cm/ft5pwt790hsz2j3knw/3104279-0.jpg) -100px 0 no-repeat"></div>
<div>Page 2</div>
</div>
</a>
<a href="https://exhentai.org/s/f075e36fc2/3104279-3">
<div>
<div title="Page 3: P_003.png" style="width:100px;height:141px;background:transparent url(https://zurswtyclg.hath.network/cm/ft5pwt790hsz2j3knw/3104279-0.jpg) -200px 0 no-repeat"></div>
<div>Page 3</div>
</div>
</a>
</div>
fixed in 1.7.22
简略描述 / Describe the bug Parse error when trying to view a gallery.
如何重现 / To Reproduce Steps to reproduce the behavior:
预期行为 / Expected behavior Gallery should be viewable
截图 / Screenshots None
设备型号与 Android 版本 / Device model and Android version
备注 / Additional context Same issue as https://github.com/exzhawk/EhViewer/issues/32, but the latest update did not fix it for me.