exzhawk / EhViewer

A Modified Unofficial E-Hentai Application for Android
https://blog.exz.me/post/ehviewer-mod/
Apache License 2.0
372 stars 20 forks source link

Sadpanda Parse Error #33

Closed S52968 closed 7 hours ago

S52968 commented 2 days ago

简略描述 / Describe the bug Parse error when trying to view a gallery.

如何重现 / To Reproduce Steps to reproduce the behavior:

  1. Click on a gallery
  2. See parse error

预期行为 / Expected behavior Gallery should be viewable

截图 / Screenshots None

设备型号与 Android 版本 / Device model and Android version

备注 / Additional context Same issue as https://github.com/exzhawk/EhViewer/issues/32, but the latest update did not fix it for me.

RadarNyan commented 1 day ago

The regex won't match since there's another nested layer of <div>, not sure if it's added after exzhawk made the fix.

Here's my fix, simply changing PATTERN_LARGE_PREVIEW_NEW would already fix the problem, but I didn't really like the idea of parsing the entire page of HTML with regex, so I tried to limit the amount of HTML code being parsed.

- app/src/main/java/com/hippo/ehviewer/client/parser/GalleryDetailParser.java -
index dfdb862a..62b22667 100644
@@ -70,7 +70,8 @@ public class GalleryDetailParser {
     private static final Pattern PATTERN_NORMAL_PREVIEW = Pattern.compile("<div class=\"gdtm\"[^<>]*><div[^<>]*width:(\\d+)[^<>]*height:(\\d+)[^<>]*\\((.+?)\\)[^<>]*-(\\d+)px[^<>]*><a[^<>]*href=\"(.+?)\"[^<>]*><img alt=\"([\\d,]+)\"");
     private static final Pattern PATTERN_NORMAL_PREVIEW_NEW=Pattern.compile("<a[^<>]*href=\"([^\"]+)\"><div title=\"Page (\\d+): [^\"]+\" style=\"width:(\\d+)[^<>]*height:(\\d+)[^<>]*\\((.+?)\\)[^<>]*-(\\d+)px");
     private static final Pattern PATTERN_LARGE_PREVIEW = Pattern.compile("<div class=\"gdtl\".+?<a href=\"(.+?)\"><img alt=\"([\\d,]+)\".+?src=\"(.+?)\"");
-    private static final Pattern PATTERN_LARGE_PREVIEW_NEW = Pattern.compile("<a[^<>]*href=\"([^\"]+)\"><div title=\"Page (\\d+): [^\"]+\"[^<>]*\\((.+?)\\)[^<>]*\">");
+    private static final Pattern PATTERN_LARGE_PREVIEW_NEW = Pattern.compile("f=\"([^\"]+)\"[^\"]+\"Page (\\d+)[^(]+\\(([^)]+)");
+    private static final Pattern PATTERN_PREVIEW_RN = Pattern.compile("(\\d+)[^(]+\\(([^)]+)");

     private static final GalleryTagGroup[] EMPTY_GALLERY_TAG_GROUP_ARRAY = new GalleryTagGroup[0];
     private static final GalleryCommentList EMPTY_GALLERY_COMMENT_ARRAY = new GalleryCommentList(new GalleryComment[0], false);
@@ -601,7 +602,8 @@ public class GalleryDetailParser {

     public static PreviewSet parsePreviewSet(String body) throws ParseException {
         try {
-            return parseLargePreviewSet(body);
+            return parseLargePreviewSet(Jsoup.parse(body), body);
         } catch (ParseException e) {
             try {
                 return parseNormalPreviewSet(body);
@@ -621,7 +623,22 @@ public class GalleryDetailParser {
             Elements gdtls = gdt.getElementsByClass("gdtl");
             int n = gdtls.size();
             if (n <= 0) {
-                throw new ParseException("Can't parse large preview", body);
+                if (gdt.childNodeSize() <= 0) {
+                    throw new ParseException("Can't parse large preview", body);
+                }
+                for (Node previewNode : gdt.childNodes()) {
+                    String pageUrl = previewNode.attr("href");
+                    String imageNode = previewNode.childNode(0).toString();
+                    Matcher m = PATTERN_PREVIEW_RN.matcher(imageNode);
+                    while (m.find()){
+                        int index = Integer.parseInt(m.group(1));
+                        String imageUrl = m.group(2);
+                        largePreviewSet.addItem(index - 1, imageUrl, pageUrl);
+                    }
+                }
+                return largePreviewSet;
             }
             for (int i = 0; i < n; i++) {
                 Element element = gdtls.get(i).child(0);
exzhawk commented 1 day ago

Please provide your "Thumbnail Settings" in Settings and specify which site you are using. I can't reproduce it with my account. please provide the preview PART of the HTML. not the full HTML in case of sensitive information. something like this:

        <div id="gdt" class="gt200">
            <a href="https://exhentai.org/s/96d06243c5/3102481-1">
                <div title="Page 1: 01.jpg" style="width:200px;height:283px;background:transparent url(https://s.exhentai.org/t/96/d0/96d06243c5ff903962b980be4a893149e80db3dd-448601-1447-2048-jpg_l.jpg) 0 0 no-repeat"></div>
            </a>
            <a href="https://exhentai.org/s/bd962d6687/3102481-2">
                <div title="Page 2: 02.jpg" style="width:200px;height:283px;background:transparent url(https://s.exhentai.org/t/bd/96/bd962d66878d1215fe5441085e63998f287de489-317032-1447-2048-jpg_l.jpg) 0 0 no-repeat"></div>
            </a>
            <a href="https://exhentai.org/s/64f4dacc95/3102481-3">
                <div title="Page 3: 03.jpg" style="width:200px;height:283px;background:transparent url(https://s.exhentai.org/t/64/f4/64f4dacc95f6a13a55eca635772bf0c8ce360252-437103-1447-2048-jpg_l.jpg) 0 0 no-repeat"></div>
            </a>
            <a href="https://exhentai.org/s/1930398007/3102481-4">
                <div title="Page 4: 04.jpg" style="width:200px;height:283px;background:transparent url(https://s.exhentai.org/t/19/30/193039800725b7236e80fc1b8fa2cb51db83bf0b-342115-1447-2048-jpg_l.jpg) 0 0 no-repeat"></div>
            </a>
            <a href="https://exhentai.org/s/8805cfb0ba/3102481-5">
                <div title="Page 5: 05.jpg" style="width:200px;height:283px;background:transparent url(https://s.exhentai.org/t/88/05/8805cfb0ba0ee48a448b6986eede8706d82cd812-425874-1447-2048-jpg_l.jpg) 0 0 no-repeat"></div>
            </a>
            <a href="https://exhentai.org/s/0849fe6d2f/3102481-6">
                <div title="Page 6: 06.jpg" style="width:200px;height:283px;background:transparent url(https://s.exhentai.org/t/08/49/0849fe6d2f5f2d6fdc3fcaac187ef16b79db3583-290782-1447-2048-jpg_l.jpg) 0 0 no-repeat"></div>
            </a>
        </div>
RadarNyan commented 1 day ago

@exzhawk I think I've found the cause: there is a setting called "Gallery Page Thumbnail Labeling" which defaults to "Page Number Only" causing the DOM to have a nested <div> like this:

<a href="https://exhentai.org/s/96d06243c5/3102481-1">
 <div>
  <div title="Page 1: 01.jpg" style="width:200px;height:283px;background:transparent url(https://s.exhentai.org/t/96/d0/96d06243c5ff903962b980be4a893149e80db3dd-448601-1447-2048-jpg_l.jpg) 0 0 no-repeat"></div>
  <div>
   Page 1
  </div>
 </div></a>

The code I posted in the previous comment (edited just now, removed one .childNode(0)) would work for both cases, feel free to use it if you like. Also, please take a look at issue #30 and #31 if you have time.

Linnest2020 commented 1 day ago

@exzhawk I think I've found the cause: there is a setting called "Gallery Page Thumbnail Labeling" which defaults to "Page Number Only" causing the DOM to have a nested <div> like this:

<a href="https://exhentai.org/s/96d06243c5/3102481-1">
 <div>
  <div title="Page 1: 01.jpg" style="width:200px;height:283px;background:transparent url(https://s.exhentai.org/t/96/d0/96d06243c5ff903962b980be4a893149e80db3dd-448601-1447-2048-jpg_l.jpg) 0 0 no-repeat"></div>
  <div>
   Page 1
  </div>
 </div></a>

The code I posted in the previous comment (edited just now, removed one .childNode(0)) would work for both cases, feel free to use it if you like. Also, please take a look at issue #30 and #31 if you have time.

It do works on my clients when change Gallery Page Thumbnail Labeling in Ehwebsite Setting! Thus it is the key to resolve this issue and my issues #35

S52968 commented 23 hours ago

I know other people have already answered, and a solution has been found, but just in case:

Screenshot_20241027_201424_EhViewer Screenshot_20241027_201403_EhViewer

Here's what my HTML looks like:

<div id="gdt" class="gt100">
    <a href="https://exhentai.org/s/f9f96c2677/3104279-2">
        <div>
            <div title="Page 2: P_002.png" style="width:100px;height:141px;background:transparent url(https://zurswtyclg.hath.network/cm/ft5pwt790hsz2j3knw/3104279-0.jpg) -100px 0 no-repeat"></div>
            <div>Page 2</div>
        </div>
    </a>
    <a href="https://exhentai.org/s/f075e36fc2/3104279-3">
        <div>
            <div title="Page 3: P_003.png" style="width:100px;height:141px;background:transparent url(https://zurswtyclg.hath.network/cm/ft5pwt790hsz2j3knw/3104279-0.jpg) -200px 0 no-repeat"></div>
            <div>Page 3</div>
        </div>
    </a>
</div>
exzhawk commented 7 hours ago

fixed in 1.7.22