JupiterBroadcasting / show-scraper

Scraper written in python to convert episodes hosted on Fireside or jupiterbroadcasting.com into Hugo Markdown files
5 stars 5 forks source link

Missing Episode Description - LAN242 #23

Closed gerbrent closed 2 years ago

gerbrent commented 2 years ago

See https://jupiterbroadcasting.net/show/linux-action-news/242/ versus https://linuxactionnews.com/242

Description is missing, though everything else seems fine on quick glance.

kbondarev commented 2 years ago

The description is retrieved from the Fireside JSON api/feed and the summary value of each episode. For some reason LAN242's summary is empty:

      "summary": "",

This makes me wonder if other episodes might have similar problem.

The "archived scraper" I'm working on uses the RSS xml feed rather than the JSON for this, and reads the <description> tag which seems all good for LAN242 Once that "archived scraper" is done I plan to merge some of its functionality into the original scraper. There are quite a few things that I think will be done in a more better/safe way there (reading the RSS rather than the JSON from fireside is one example).

gerbrent commented 2 years ago

I can easily fix this in Fireside too - sounds simpler there than here in scraperland for the odd data hiccup.

Another approach is to grab the JSON description value instead, if exists?

From our Fireside use, we typically paste the same content into both fields in the UI when creating an episode, however the summary field has a shorter character limit.

kbondarev commented 2 years ago

Another approach is to grab the JSON description value instead, if exists?

There's no description value in the JSON :/ I could do this change very quickly:

For reference here's the json object for LAN 242:

{
    "id": "e210d66a-b0bd-4d24-a32b-402e61d12ce5",
    "title": "Linux Action News 242",
    "url": "https://linuxactionnews.com/242",
    "content_text": "The controversial Intel code now shipping in Linux, why F-Droid is getting more attractive for developers, and the rumor that could change the industry.Sponsored By:Ting: Save $25 off your first device, or $25 in service credit if you bring one!Linode: Sign up using the link on this page and receive a $100 60-day credit towards your new account. Links:Linux 5.18 Released With Intel SDSi, New CPU &amp; GPU Features — Linux 5.18 brings the controversial Intel Software Defined Silicon (SDSi) functionality.Thoughts on software-defined silicon — Its purpose is to disable access to specific processor capabilities in the absence of a certificate from Intel saying otherwise.Statistics from the 5.18 development cycleBtrfs Gets Some Buttery Good Improvements With Linux 5.19 — David Sterba of SUSE has submitted the ~4k lines of code worth of feature changes for the Btrfs file-system driver in the Linux 5.19 kernel.Our build and release infrastructure, and upcoming updates —  This work will be incrementally deployed as each bit is finished. So be patient, and you will notice releases happening faster and faster!Google Summer of Code (GSOC 2022) Highlights of FOSS Projects — Google announced the GSoC 2022 projects, and the list includes some exciting improvements to the mainstream foss projects such as GNOME, Xfce, LibreOffice, etc. Ubuntu 22.10 Makes PipeWire Default for Audio — “That’s right, as of today the Kinetic ISO (pending, not yet current since the changes were just made) has been updated to run only PipeWire and not PulseAudio […] you can look forward to this for Kinetic”Broadcom-VMware Deal Said to Be Ready as Soon as This Week — Broadcom Inc. could announce an agreement to acquire cloud-computing company VMware Inc. as soon as this weekBroadcom’s Potential VMware Acquisition: 5 Things About Dell, Stock Prices And Hock Tan To KnowBroadcom in Talks to Pay About $60 Billion for VMwareIntel CEO Pat Gelsinger Has Mixed Feelings on a Broadcom-VMware Deal",
    "content_html": "<p>The controversial Intel code now shipping in Linux, why F-Droid is getting more attractive for developers, and the rumor that could change the industry.</p><p>Sponsored By:</p><ul><li><a rel=\"nofollow\" href=\"https://linux.ting.com\">Ting</a>: <a rel=\"nofollow\" href=\"https://linux.ting.com\">Save $25 off your first device, or $25 in service credit if you bring one!</a></li><li><a rel=\"nofollow\" href=\"http://linode.com/lan\">Linode</a>: <a rel=\"nofollow\" href=\"http://linode.com/lan\">Sign up using the link on this page and receive a $100 60-day credit towards your new account. </a></li></ul><p>Links:</p><ul><li><a title=\"Linux 5.18 Released With Intel SDSi, New CPU &amp; GPU Features\" rel=\"nofollow\" href=\"https://www.phoronix.com/scan.php?page=news_item&amp;px=Linux-5.18-Released\">Linux 5.18 Released With Intel SDSi, New CPU &amp; GPU Features</a> &mdash; Linux 5.18 brings the controversial Intel Software Defined Silicon (SDSi) functionality.</li><li><a title=\"Thoughts on software-defined silicon\" rel=\"nofollow\" href=\"https://lwn.net/Articles/884876/\">Thoughts on software-defined silicon</a> &mdash; Its purpose is to disable access to specific processor capabilities in the absence of a certificate from Intel saying otherwise.</li><li><a title=\"Statistics from the 5.18 development cycle\" rel=\"nofollow\" href=\"https://lwn.net/Articles/895800/\">Statistics from the 5.18 development cycle</a></li><li><a title=\"Btrfs Gets Some Buttery Good Improvements With Linux 5.19\" rel=\"nofollow\" href=\"https://www.phoronix.com/scan.php?page=news_item&amp;px=Btrfs-Linux-5.19-Changes\">Btrfs Gets Some Buttery Good Improvements With Linux 5.19</a> &mdash; David Sterba of SUSE has submitted the ~4k lines of code worth of feature changes for the Btrfs file-system driver in the Linux 5.19 kernel.</li><li><a title=\"Our build and release infrastructure, and upcoming updates\" rel=\"nofollow\" href=\"https://f-droid.org/2022/05/24/buildserver-overhaul-sponsored-by-calyx-institute.html\">Our build and release infrastructure, and upcoming updates</a> &mdash;  This work will be incrementally deployed as each bit is finished. So be patient, and you will notice releases happening faster and faster!</li><li><a title=\"Google Summer of Code (GSOC 2022) Highlights of FOSS Projects\" rel=\"nofollow\" href=\"https://debugpointnews.com/gsoc-2022/\">Google Summer of Code (GSOC 2022) Highlights of FOSS Projects</a> &mdash; Google announced the GSoC 2022 projects, and the list includes some exciting improvements to the mainstream foss projects such as GNOME, Xfce, LibreOffice, etc. </li><li><a title=\"Ubuntu 22.10 Makes PipeWire Default for Audio\" rel=\"nofollow\" href=\"https://9to5linux.com/looks-like-ubuntu-22-10-will-finally-switch-to-pipewire-by-default-and-drop-pulseaudio\">Ubuntu 22.10 Makes PipeWire Default for Audio</a> &mdash; “That’s right, as of today the Kinetic ISO (pending, not yet current since the changes were just made) has been updated to run only PipeWire and not PulseAudio […] you can look forward to this for Kinetic”</li><li><a title=\"Broadcom-VMware Deal Said to Be Ready as Soon as This Week\" rel=\"nofollow\" href=\"https://www.bloomberg.com/news/articles/2022-05-22/broadcom-said-to-be-in-talks-to-acquire-vmware\">Broadcom-VMware Deal Said to Be Ready as Soon as This Week</a> &mdash; Broadcom Inc. could announce an agreement to acquire cloud-computing company VMware Inc. as soon as this week</li><li><a title=\"Broadcom’s Potential VMware Acquisition: 5 Things About Dell, Stock Prices And Hock Tan To Know\" rel=\"nofollow\" href=\"https://www.crn.com/slide-shows/cloud/broadcom-s-potential-vmware-acquisition-5-things-to-know-about-dell-stock-prices-and-hock-tan\">Broadcom’s Potential VMware Acquisition: 5 Things About Dell, Stock Prices And Hock Tan To Know</a></li><li><a title=\"Broadcom in Talks to Pay About $60 Billion for VMware\" rel=\"nofollow\" href=\"https://www.wsj.com/articles/broadcom-discussing-paying-around-140-a-share-for-vmware-people-say-11653334946\">Broadcom in Talks to Pay About $60 Billion for VMware</a></li><li><a title=\"Intel CEO Pat Gelsinger Has Mixed Feelings on a Broadcom-VMware Deal\" rel=\"nofollow\" href=\"https://www.bloomberg.com/news/articles/2022-05-23/intel-s-gelsinger-has-mixed-feelings-on-broadcom-vmware-deal\">Intel CEO Pat Gelsinger Has Mixed Feelings on a Broadcom-VMware Deal</a></li></ul>",
    "summary": "",
    "date_published": "2022-05-26T05:30:00.000-07:00",
    "attachments": [
        {
            "url": "https://chtbl.com/track/392D9/aphid.fireside.fm/d/1437767933/dec90738-e640-45e5-b375-4573052f4bf4/e210d66a-b0bd-4d24-a32b-402e61d12ce5.mp3",
            "mime_type": "audio/mp3",
            "size_in_bytes": 13581396,
            "duration_in_seconds": 970
        }
    ]
}