fake-name / wlnupdates

It's a WEBSITE! Whooooo!
http://wlnupdates.com
30 stars 6 forks source link

Chapters out of order for royal road #706

Open Mihoshika opened 3 years ago

Mihoshika commented 3 years ago

E.g. https://www.wlnupdates.com/series-id/55717/the-wandering-inn The interludes are all put to the start

https://www.wlnupdates.com/series-id/62765/the-true-endgame Just a mess. I have no clue how these were sorted.

fake-name commented 3 years ago

Yep, that happens when authors number things incoherently.

https://www.wlnupdates.com/series-id/55717/the-wandering-inn

Chapters are sorted numerically based on their title. The author here uses a idiosyncratic decimal number chapter ordering, and has no chapter numbers in the interludes.

https://www.wlnupdates.com/series-id/62765/the-true-endgame

WTF is a "patch" chapter? It's a mess because the source chapters are a mess.

[Vol 1. pt. 13] Patch 3.0: To the City of Pirates!

This gets parsed as

Basically, the author is using part instead of chapter, for some bizarre reason.

Basically, see https://www.wlnupdates.com/help#auto-release-requirements. The problem here is the source chapters are a mess. Furthermore, I think the OP has possibly re-numbered their series in the past.

Additionally, there's a heuristic system that overrides the numbering of chapters if the parser fails to extract valid-looking chapters for >= 80% of the chapters in a series. I think that's kicked in a few times as well.

Basically, these are broken upstream, and short of the authors fixing their broken numbering, they are what they are.

Effectively, I have a system that is trying to get meaningful volume/chapter/part numbering from what is basically an unstructured input, and this is kind of impossible in the general case. The heuristic I have is decent for probably 99.9% of the input cases, but it does break down because multiple people use the same words for the opposite meaning in various places.

Mihoshika commented 3 years ago

Is there a reason you are unable to just get it from the site in the order it is already listed in?

On Fri, Nov 27, 2020, 12:57 AM C W notifications@github.com wrote:

Yep, that happens when authors number things incoherently.

https://www.wlnupdates.com/series-id/55717/the-wandering-inn

Chapters are sorted numerically based on their title. The author here uses a idiosyncratic decimal number chapter ordering, and has no chapter numbers in the interludes.

https://www.wlnupdates.com/series-id/62765/the-true-endgame

WTF is a "patch" chapter? It's a mess because the source chapters are a mess.

[Vol 1. pt. 13] Patch 3.0: To the City of Pirates!

This gets parsed as

  • <FreeTextToken - contents: '['>
  • <VolumeToken - contents: 'Vol' ' ' '1.' (numeric: True, ascii: None, parsed: No>
  • <FreeTextToken - contents: ' '>
  • <FragmentToken - contents: 'pt' '. ' '13' (numeric: True, ascii: None, parsed: No>
  • <FreeTextToken - contents: '] '>
  • <FreeTextToken - contents: 'Patch'>
  • <FreeTextToken - contents: ' '>
  • <FreeChapterToken - contents: '' '' '3.0' (numeric: True, ascii: None, parsed: No>
  • <FreeTextToken - contents: ': '>
  • <FreeTextToken - contents: 'To'>
  • <FreeTextToken - contents: ' '>
  • <FreeTextToken - contents: 'the'>
  • <FreeTextToken - contents: ' '>
  • <FreeTextToken - contents: 'City'>
  • <FreeTextToken - contents: ' '>
  • <FreeTextToken - contents: 'of'>
  • <FreeTextToken - contents: ' '>
  • <FreeTextToken - contents: 'Pirates'>
  • <FreeTextToken - contents: '!'>

Basically, the author is using part instead of chapter, for some bizarre reason.

Basically, see https://www.wlnupdates.com/help#auto-release-requirements. The problem here is the source chapters are a mess. Furthermore, I think the OP has possibly re-numbered their series in the past.

Additionally, there's a heuristic system that overrides the numbering of chapters if the parser fails to extract valid-looking chapters for >= 80% of the chapters in a series. I think that's kicked in a few times as well.

Basically, these are broken upstream, and short of the authors fixing their broken numbering, they are what they are.

Effectively, I have a system that is trying to get meaningful volume/chapter/part numbering from what is basically an unstructured input, and this is kind of impossible in the general case. The heuristic I have is decent for probably 99.9% of the input cases, but it does break down because multiple people use the same words for the opposite meaning in various places.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/fake-name/wlnupdates/issues/706#issuecomment-734659476, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKVIOVHOPZELPRYOZZQUNQ3SR45WFANCNFSM4UCARLHA .

fake-name commented 3 years ago

I don't store that information, and the reading list progress tracking would be still broken anyways (it only operates on the basis of extracted chapter numbers).

I can display things in chronological order instead of chapter order (it's a setting on a per-series basis), but that's something I generally don't use unless it's because a series is getting re-translated by a better translator or similar, and I want the newer, earlier chapters to bubble up to the top of the list.

Again, I can generally fix specific instances of things being broken (I can also force sequential numbering on OEL series by editing a override configuration file), but a lot of the issue is that I don't want to have to.

If I had more moderation help (or any moderation help, really), that would be something I'd certainly be open to revisiting, but right now a major design goal for the site is to be as completely automated as possible.

Mihoshika commented 3 years ago

I understand that it'd be unreasonable to ask for you to have to manually add custom filters (or whatever) for every site which doesn't work with your current set-up.

That being said, you could manually add in major sites, such as Royal Road.

To be honest, RR should be relatively easy to get the order of, assuming you use a simple crawler to get links from story page.

fake-name commented 3 years ago

It's not per-site, it's per series.

And fundamentally, all that does is change the display order. Thing will still be broken in the reading list system, so I don't really see the point.