ArtskydJ / comicsrss.com

RSS feeds for comics
https://www.comicsrss.com
75 stars 8 forks source link

Add more comic strips #86

Open ArtskydJ opened 6 years ago

ArtskydJ commented 6 years ago

Before you write a scraper for comicsrss, please know that I don't want comicsrss to have some types of comic strips.

I don't want comicsrss to have sexually-suggestive comics. For example, I've considered killing the rss feed for 9 Chickweek Lane, and I still might kill it someday. I'm going to avoid adding anything to comicsrss that's more suggestive than that.

I might kill off political comics. I haven't yet, but I've been strongly considering it for a while now. Internet politics discussions tend to be tribal and echo-chambery, but political comics step that up a few notches.


List of comic strips/websites that folks have requested, and who requested them

Planned:

ghost commented 5 years ago

I'd love to see some Comics Kingdom strips added, if possible. (For me, personally, mainly Bizarro, Rhymes with Orange, and Darrin Bell.)

ArtskydJ commented 5 years ago

Arcamax has Bizarro, Dilbert, and Rhymes with Orange, and Darrin Bell.

Both Comics Kingdom, and Arcamax look like they will be much more difficult to scrape than gocomics.

ArtskydJ commented 5 years ago

Added Dilbert today.

ArtskydJ commented 5 years ago

I don't remember why I thought Arcamax would be particularly difficult. It doesn't look like it will be that hard...

<a class="prev" href="/thefunnies/brilliantmindofedisonlee/s-2160999" title="Brilliant Mind of Edison Lee 1/3/2019"><span class="entypo-left-open"></span></a>
  <span class="cur">January  4</span>
<a class="next-off" href="#"><span class="entypo-right-open"></span></a>

<!-- ... -->

<figure class="comic">
  <img id="comic-zoom" data-zoom-image="/newspics/168/16885/1688589.gif" src="/newspics/168/16885/1688589.gif"  data-width="600" data-height="187" alt="" class="img-responsive the-comic" title="click or tap to zoom" />
  <cite class="comic-copyright">(c) 2019 John Hambrock.  Dist. by King Features Syndicate, Inc.</cite>
</figure>

Hopefully I'll get around to it within a few weeks.

infinitytec commented 5 years ago

Could I request Sherman's Lagoon and Freefall (the latter is a webcomic found at freefall.purrsia.com)?

ArtskydJ commented 5 years ago

Sherman's lagoon is on Comics Kingdom. If/when I add comics Kingdom, I can @ you in this thread.

I doubt I'll add Freefall unless it is part of a larger site like Comics Kingdom or Arcamax. If there's enough demand for it, I might add it.

Or you could look into adding it similar to dilbert was added: https://github.com/ArtskydJ/comicsrss.com/blob/gh-pages/_generator/scraper-dilbert/index.js There isn't really an API for making a scraper... :frowning_face:


This is what I did for dilbert (and the process would be similar on freefall):

  1. Grab a page that shows multiple comics, including the latest comic a. For dilbert it was https://dilbert.com b. For freefall it might be http://freefall.purrsia.com/lastthree.htm
  2. Parse the HTML to turn it into an array like this:
    [
    {
        "titleAuthorDate": "Freefall by Tugrik for Wednesday 6/12/2019",
        "url": "http://freefall.purrsia.com/ff3300/fc03290.htm",
        "date": "2019-06-12",
        "comicImageUrl": "http://freefall.purrsia.com/ff3300/fc03290.png"
    },
    ...
    ]
  3. Open the cached version of that array, and merge them together. (If I don't have the latest comic in the cached array, then I need to push it onto the array.)
  4. Write the cached file to disk.
  5. Integrate it with the rest of the system. (If you do everything else I would be more than happy to integrate your scraper.)
infinitytec commented 5 years ago

Thanks for the information! I'll look into it and see what I can do!

ArtskydJ commented 5 years ago

I made an API and published it in the README.

jgbishop commented 4 years ago

Any progress on this? I've looked into scraping Comics Kingdom in the past year myself, and it's pretty difficult. Lots of the page gets loaded dynamically when first visited in a web browser. The publishers are clearly trying their best to prevent scraping, but my scraping knowledge is fairly limited when it comes to dynamic data. Maybe the arcamax website would be easier?

ArtskydJ commented 4 years ago

@jgbishop Very little progress. You can see in _generator/site-scrapers/ that there are 2 Work In Progress folders. I haven't done anything since then.

Getting a functional scraper is probably around 2-10 hours of work. (Depending on how smoothly it goes, and if you run into any issues, like rate-limiting.) The reason that I haven't made another site scraper is not because of a technical issue blocking the way. It's just I haven't made it a priority.

And I personally don't have a ton of incentive to expand comicsrss since it does all that I need. I still want to scrape more sites.

If you have a specific comic strip that you're wanting, you could try making a scraper just for it, instead of the entire arcamax/comics kingdom site. And that might be a nice starting point for me to expand it to the whole site.

One more thing to note is that if/when arcamax or comics kingdom is added, the site generator will have to avoid making two entries when a comic is in both gocomics.com and the added site.

ArtskydJ commented 4 years ago

@jgbishop I finally added Arcamax comics.

jgbishop commented 4 years ago

Woo-hoo! Thanks! 👏 🍰

ghost commented 4 years ago

Beetle Bailey and Hagar the Horrible, at last!

infinitytec commented 2 years ago

Well, I may have figured out something for Comics Kingdom: https://jsfiddle.net/p0tojns1/1/

Not a full scraper, and only for Sherman's Lagoon, but it may help.

ArtskydJ commented 2 years ago

Interesting...

Earlier, I'd decided not to write a scraper for Comics Kingdom, because I remembered Comics Kingdom being very dynamic. But it looks quite do-able to scrape that site now?

So I'm now planning to write a scraper for Comics Kingdom. I'm not promising anything. 😁 Difficulties might come up where I change my mind again, and abandon Comics Kingdom again. But I hope to get it working!

jalberto commented 2 years ago

I would like to suggest https://workchronicles.com

ArtskydJ commented 2 years ago

I would like to suggest workchronicles.com

They already have an RSS feed: https://workchronicles.com/feed/

jalberto commented 2 years ago

Totally missed it, thanks

On Tue, 16 Nov 2021 at 16:31, Joseph Dykstra @.***> wrote:

I would like to suggest workchronicles.com

They already have an RSS feed: https://workchronicles.com/feed/

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ArtskydJ/comicsrss.com/issues/86#issuecomment-970389109, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAYMV33XI4D6YYNC73DR3LUMJ2MZANCNFSM4FPFSS4Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

twizzayy commented 2 years ago

Cant wait for The Far side to be added. Thanks for this awesome resource. :)

infinitytec commented 2 years ago

Hey, looks like Sherman's Lagoon is now on GoComics so it's being scraped!

ArtskydJ commented 2 years ago

I added Comics Kingdom strips to https://www.comicsrss.com/

@infinitytec

tylerbenson commented 1 year ago

Would it be difficult to add support for https://tinyview.com/ and https://www.webtoons.com/ hosted comics?

Webtoons has an RSS feed, but usually only shows the first pane of the comic.

Thanks!

tylerbenson commented 11 months ago

I tried to add additional details for tinyview: #141.

ArtskydJ commented 11 months ago

I just updated the original post.

Webtoons has some "mature"-rated comics, which I don't want on comicsrss. The "young adult"-rated comics varied a lot in their suggestiveness. Webtoons, by nature of its user-generated content, is difficult to categorize. If someone wrote a scraper for webtoons, even with the "mature"-rated comics filtered out, I'm not sure if I'd merge it into comicsrss.

I'd probably merge a scraper for tinyview. Most seemed fine. Maybe I'd filter out "Eggs n' Ben", IDK.

tylerbenson commented 11 months ago

Makes sense... For the record, I was interested in some of the family friendly cartoons for each, and I totally respect your desire to keep things clean. (I've sent my teen son to your site to find comics to read.)