Open Mihoshika opened 3 years ago
Yep, that happens when authors number things incoherently.
https://www.wlnupdates.com/series-id/55717/the-wandering-inn
Chapters are sorted numerically based on their title. The author here uses a idiosyncratic decimal number chapter ordering, and has no chapter numbers in the interludes.
WTF is a "patch" chapter? It's a mess because the source chapters are a mess.
[Vol 1. pt. 13] Patch 3.0: To the City of Pirates!
This gets parsed as
<FreeTextToken - contents: '['>
<VolumeToken - contents: 'Vol' ' ' '1.' (numeric: True, ascii: None, parsed: No>
<FreeTextToken - contents: ' '>
<FragmentToken - contents: 'pt' '. ' '13' (numeric: True, ascii: None, parsed: No>
<FreeTextToken - contents: '] '>
<FreeTextToken - contents: 'Patch'>
<FreeTextToken - contents: ' '>
<FreeChapterToken - contents: '' '' '3.0' (numeric: True, ascii: None, parsed: No>
<FreeTextToken - contents: ': '>
<FreeTextToken - contents: 'To'>
<FreeTextToken - contents: ' '>
<FreeTextToken - contents: 'the'>
<FreeTextToken - contents: ' '>
<FreeTextToken - contents: 'City'>
<FreeTextToken - contents: ' '>
<FreeTextToken - contents: 'of'>
<FreeTextToken - contents: ' '>
<FreeTextToken - contents: 'Pirates'>
<FreeTextToken - contents: '!'>
Basically, see https://www.wlnupdates.com/help#auto-release-requirements. The problem here is the source chapters are a mess. Furthermore, I think the OP has possibly re-numbered their series in the past.
Additionally, there's a heuristic system that overrides the numbering of chapters if the parser fails to extract valid-looking chapters for >= 80% of the chapters in a series. I think that's kicked in a few times as well.
Basically, these are broken upstream, and short of the authors fixing their broken numbering, they are what they are.
Effectively, I have a system that is trying to get meaningful volume/chapter/part numbering from what is basically an unstructured input, and this is kind of impossible in the general case. The heuristic I have is decent for probably 99.9% of the input cases, but it does break down because multiple people use the same words for the opposite meaning in various places.
Is there a reason you are unable to just get it from the site in the order it is already listed in?
On Fri, Nov 27, 2020, 12:57 AM C W notifications@github.com wrote:
Yep, that happens when authors number things incoherently.
https://www.wlnupdates.com/series-id/55717/the-wandering-inn
Chapters are sorted numerically based on their title. The author here uses a idiosyncratic decimal number chapter ordering, and has no chapter numbers in the interludes.
https://www.wlnupdates.com/series-id/62765/the-true-endgame
WTF is a "patch" chapter? It's a mess because the source chapters are a mess.
[Vol 1. pt. 13] Patch 3.0: To the City of Pirates!
This gets parsed as
- <FreeTextToken - contents: '['>
- <VolumeToken - contents: 'Vol' ' ' '1.' (numeric: True, ascii: None, parsed: No>
- <FreeTextToken - contents: ' '>
- <FragmentToken - contents: 'pt' '. ' '13' (numeric: True, ascii: None, parsed: No>
- <FreeTextToken - contents: '] '>
- <FreeTextToken - contents: 'Patch'>
- <FreeTextToken - contents: ' '>
- <FreeChapterToken - contents: '' '' '3.0' (numeric: True, ascii: None, parsed: No>
- <FreeTextToken - contents: ': '>
- <FreeTextToken - contents: 'To'>
- <FreeTextToken - contents: ' '>
- <FreeTextToken - contents: 'the'>
- <FreeTextToken - contents: ' '>
- <FreeTextToken - contents: 'City'>
- <FreeTextToken - contents: ' '>
- <FreeTextToken - contents: 'of'>
- <FreeTextToken - contents: ' '>
- <FreeTextToken - contents: 'Pirates'>
- <FreeTextToken - contents: '!'>
Basically, the author is using part instead of chapter, for some bizarre reason.
Basically, see https://www.wlnupdates.com/help#auto-release-requirements. The problem here is the source chapters are a mess. Furthermore, I think the OP has possibly re-numbered their series in the past.
Additionally, there's a heuristic system that overrides the numbering of chapters if the parser fails to extract valid-looking chapters for >= 80% of the chapters in a series. I think that's kicked in a few times as well.
Basically, these are broken upstream, and short of the authors fixing their broken numbering, they are what they are.
Effectively, I have a system that is trying to get meaningful volume/chapter/part numbering from what is basically an unstructured input, and this is kind of impossible in the general case. The heuristic I have is decent for probably 99.9% of the input cases, but it does break down because multiple people use the same words for the opposite meaning in various places.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/fake-name/wlnupdates/issues/706#issuecomment-734659476, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKVIOVHOPZELPRYOZZQUNQ3SR45WFANCNFSM4UCARLHA .
I don't store that information, and the reading list progress tracking would be still broken anyways (it only operates on the basis of extracted chapter numbers).
I can display things in chronological order instead of chapter order (it's a setting on a per-series basis), but that's something I generally don't use unless it's because a series is getting re-translated by a better translator or similar, and I want the newer, earlier chapters to bubble up to the top of the list.
Again, I can generally fix specific instances of things being broken (I can also force sequential numbering on OEL series by editing a override configuration file), but a lot of the issue is that I don't want to have to.
If I had more moderation help (or any moderation help, really), that would be something I'd certainly be open to revisiting, but right now a major design goal for the site is to be as completely automated as possible.
I understand that it'd be unreasonable to ask for you to have to manually add custom filters (or whatever) for every site which doesn't work with your current set-up.
That being said, you could manually add in major sites, such as Royal Road.
To be honest, RR should be relatively easy to get the order of, assuming you use a simple crawler to get links from story page.
It's not per-site, it's per series.
And fundamentally, all that does is change the display order. Thing will still be broken in the reading list system, so I don't really see the point.
E.g. https://www.wlnupdates.com/series-id/55717/the-wandering-inn The interludes are all put to the start
https://www.wlnupdates.com/series-id/62765/the-true-endgame Just a mess. I have no clue how these were sorted.