LivingWithHippos / unchained-android

App to interact with real-debrid API
GNU General Public License v3.0
396 stars 50 forks source link

[FEATURE REQUEST] Could use some help making a Nyaa.si Plug-in #183

Closed morpheasgr closed 2 years ago

morpheasgr commented 2 years ago

Greetings, I have been wanting to request such a plugin for a while, but I wanted to make myself useful so I tried to create a search plugin for the famous Anime tracker nyaa.si. It does not seem to be behind Cloudflare and other apps such as wako already scrape it. My efforts to make the scraper have failed so far. (The plugin does install but I get no response to searches.) Wish I could debug the process/look under the hood of the search query responses. Could you please take a look whenever you got time? Thanks in advance and thanks for your efforts on this amazing app. In case the code tag does not work as intended, I put the contents of my json into a pastebin. Have ran it through an online json5 validator.

Pastebin link

{
  "engine_version": 2.0,
  "version": 1.0,
  "url": "https://nyaa.si",
  "name": "Nyaa",
  "description": "Parser for Nyaa",
  "author": "morpheasgr",
  "supported_categories": {
    "all": "None"
  },
  "search": {
    "no_category": "${url}/?q=${query}&f=0&c=0_0&s=seeders&o=desc&p=${page}",
    "page_start": 1
  },
  "download": {
    "table_direct": {
      "class": "torrent-list",
      "columns": {
        "name_column": 1,
        "seeders_column": 6,
        "leechers_column": 7,
        "size_column": 4,
        "magnet_column": 3,
        "details_column": 1
      }
    },
    "regexes": {
      "magnet": {
        "regex_use": "all",
        "regexps": [
          {
            "regex": "href=\"(magnet:\\?xt=urn:btih:[^\"]+)",
            "group": 1,
            "slug_type": "complete"
          }
        ]
      },
      "name": {
        "regex_use": "first",
        "regexps": [
          {
            "regex": "title=\"([^\"]+)",
            "group": 1,
            "slug_type": "complete"
          }
        ]
      },
      "seeders": {
        "regex_use": "first",
        "regexps": [
          {
            "regex": "(\\d*)",
            "group": 1,
            "slug_type": "complete"
          }
        ]
      },
      "leechers": {
        "regex_use": "first",
        "regexps": [
          {
            "regex": "(\\d*)",
            "group": 1,
            "slug_type": "complete"
          }
        ]
      },
      "size": {
        "regex_use": "first",
        "regexps": [
          {
            "regex": "([\\w\\s.]*)",
            "group": 1,
            "slug_type": "complete"
          }
        ]
      },
      "details": {
        "regex_use": "first",
        "regexps": [
          {
            "regex": "href=\"(/view/[^\"]+)",
            "group": 1,
            "slug_type": "append_url"
          }
        ]
      }
    }
  }
}
LivingWithHippos commented 2 years ago

Congratulation for getting this far with the plugin, You only got the indexes of the table cell wrong.

Explanation with suggestions and complete plugin at the end.

  1. "no_category": "${url}/?q=${query}&f=0&c=0_0&s=seeders&o=desc&p=${page}"

Keep this simple, only use the necessary query parts:

"no_category": "${url}/?q=${query}&p=${page}"

  1. Nyaa also support categories so you could also add those, you need to search something with a category to see where it is in the url, for example audio search will be https://nyaa.si/?f=0&c=2_0&q=god, so c=2_0 is the category indicator. The category search url will be "category": "${url}/?q=${query}&c=${category}&p=${page}"

Now you can map the categories to the one available in unchained

 "supported_categories": {
    "all": "None",
    "anime": "Anime",
    "software": "Applications",
    "games": "Games",
    "movies": "Movies",
    "music": "Music",
    "tv": "TV",
    "books": "books"
  }

becomes

 "supported_categories": {
    "all": "0_0",
    "anime": "1_0",
    "software": "6_0",
    "games": "6_2",
    "music": "2_0",
    "tv": "4_0",
    "books": "3_0"
  }
  1. You can also add the torrent link. add to supported categories `"torrent_column": 2, and add the torrent regex to the regexes object\
"torrents": {
        "regex_use": "first",
        "regexps": [
          {
            "regex": "(\/download\/\\d+\\.torrent)",
            "group": 1,
            "slug_type": "append_url"
          }
        ]
      },
  1. Table direct is the correct choice, you have some wrong indexes, use the analyze/web tools of your browser to count the td tag in a tr tag to know its index (remember it starts counting from zero).
"table_direct": {
      "class": "torrent-list",
      "columns": {
        "name_column": 1,
        "seeders_column": 5,
        "leechers_column": 6,
        "size_column": 3,
        "magnet_column": 2,
        "details_column": 1
      }
    },
  1. The name regexs sometimes picks up the comments instead of the title
 "name": {
        "regex_use": "first",
        "regexps": [
          {
            "regex": "title=\"([^\"]+)",
            "group": 1,
            "slug_type": "complete"
          }
        ]
      },

this is a title cell:

<td colspan="2">
   <a href="/view/1469687#comments" class="comments" title="2 comments">
   <i class="fa fa-comments-o"></i>2</a>
   <a href="/view/1469687" title="[Dodgy] Raiga - God of the Monsters (2009) [Web 720p] [C7F990EF].mp4">[Dodgy] Raiga - God of the Monsters (2009) [Web 720p] [C7F990EF].mp4</a>
</td>

as you can see the first title is the comments' one if there are any. Cells without comments will be like this:

<td colspan="2">
   <a href="/view/1466483" title="[Members only] 【NIJISANJI EN - Rosemi Lovelock】【THE GODFATHER】Movie Watchalong! [2021-12-11] - UvdX0tBgkUk">[Members only] 【NIJISANJI EN - Rosemi Lovelock】【THE GODFATHER】Movie Watchalong! [2021-12-11] - UvdX0tBgkUk</a>
</td>

This is a little tricky because they are different but similar. My solution was starting the match from the <a href tag because the one with the comment is like <a href class title while the title one is like <a href title:

"regex": "<a\\s+href=\"[^\"]+\"\\s+title=\"([^\"]+)", will pick up only the title in both cases.

  1. The details cell has the same issue with comments as the title cell. Change the details regex from

"href=\"(/view/[^\"]+)" to href="(/view/[^"]+)"\\s+title

Final plugin

Text and attachment available.

{
  "engine_version": 2.0,
  "version": 1.0,
  "url": "https://nyaa.si",
  "name": "Nyaa",
  "description": "Parser for Nyaa",
  "author": "morpheasgr",
  "supported_categories": {
    "all": "0_0",
    "anime": "1_0",
    "software": "6_0",
    "games": "6_2",
    "music": "2_0",
    "tv": "4_0",
    "books": "3_0"
  },
  "search": {
    "category": "${url}/?q=${query}&c=${category}&p=${page}",
    "no_category": "${url}/?q=${query}&p=${page}",
    "page_start": 1
  },
  "download": {
    "table_direct": {
      "class": "torrent-list",
      "columns": {
        "name_column": 1,
        "seeders_column": 5,
        "leechers_column": 6,
        "size_column": 3,
        "magnet_column": 2,
        "torrent_column": 2,
        "details_column": 1
      }
    },
    "regexes": {
      "magnet": {
        "regex_use": "all",
        "regexps": [
          {
            "regex": "href=\"(magnet:\\?xt=urn:btih:[^\"]+)",
            "group": 1,
            "slug_type": "complete"
          }
        ]
      },
      "torrents": {
        "regex_use": "first",
        "regexps": [
          {
            "regex": "(\/download\/\\d+\\.torrent)",
            "group": 1,
            "slug_type": "append_url"
          }
        ]
      },
      "name": {
        "regex_use": "first",
        "regexps": [
          {
            "regex": "<a\\s+href=\"[^\"]+\"\\s+title=\"([^\"]+)",
            "group": 1,
            "slug_type": "complete"
          }
        ]
      },
      "seeders": {
        "regex_use": "first",
        "regexps": [
          {
            "regex": "(\\d*)",
            "group": 1,
            "slug_type": "complete"
          }
        ]
      },
      "leechers": {
        "regex_use": "first",
        "regexps": [
          {
            "regex": "(\\d*)",
            "group": 1,
            "slug_type": "complete"
          }
        ]
      },
      "size": {
        "regex_use": "first",
        "regexps": [
          {
            "regex": "([\\w\\s.]*)",
            "group": 1,
            "slug_type": "complete"
          }
        ]
      },
      "details": {
        "regex_use": "first",
        "regexps": [
          {
            "regex": "href=\"(\/view\/[^\"]+)\"\\s+title",
            "group": 1,
            "slug_type": "append_url"
          }
        ]
      }
    }
  }
}

nyaa.zip

morpheasgr commented 2 years ago

Dude, you once again delivered perfectly. Not only did you fix the plugin I botched together but you did it in an educational way, this is the perfect answer. Not implementing categories was a choice I made out of haste, to get a basic implementation first. Everything else I could not have fixed without you. Especially when it comes to regex, I cannot read or write it fluently, so I was taking stabs at it in the dark. The comments appearing first in the column's code is kind of confusing since visually it is the other way around and I thought it is a seperate column without a separator. I wrote that plugin on my phone, with Acode as an editor and Kiwi Browser (Chromium based) as a browser, which does include the Chrome Dev Tools. Thank you so much for completing it. You're awesome. Have a happy New Year.

PS: Do I need to keep the URL simple? As in no sorting by seeders etc?

morpheasgr commented 2 years ago

Closed with PR #185