JupiterBroadcasting / show-scraper

Scraper written in python to convert episodes hosted on Fireside or jupiterbroadcasting.com into Hugo Markdown files
5 stars 5 forks source link

Duplicate Self Hosted Episode #15

Open elreydetoda opened 2 years ago

elreydetoda commented 2 years ago

So, while trying to go through and understand how things are working on this repo, I've been adding in more typing for different dictionaries and checking before adding to those object. While doing that I discovered that Episode 60 of Self-Hosted on JB's main website is labeled as 59 instead of 60, when there is already a 59 :sweat_smile:

Mis-labeled: https://www.jupiterbroadcasting.com/147012/someone-elses-computer-self-hosted-59/ Actual episode 59: https://www.jupiterbroadcasting.com/146887/i-tried-to-love-portainer-self-hosted-59/

Error I was getting, because I added in a check: image

Image of page that is wrong: image

So, this'll take someone like @gerbrent or @ChrisLAS to change that in wordpress, for now I'll just create a mapping to bypass it.

kbondarev commented 2 years ago

I love the use of dataclasses instead of dicts. Good work!!

And yes, there a few of these weird exceptions/edge case that I handle in the code.

gerbrent commented 2 years ago

I guess it's proof: Humans made this.

There feels to be little point in fixing these things in WP at this point - easier to even just earmark them for fixing after transition to Hugo live (if relatively trivial) or as @kbondarev says: a few code items as temporary shims.

You all decide what's best, but changing WP feels like a lost cause.