gatsbyjs / gatsby

The best React-based framework with performance, scalability and security built in.
https://www.gatsbyjs.com
MIT License
55.27k stars 10.32k forks source link

[gatsby-source-medium] Plugin fails due to new medium cloudflare ddos prevention #17335

Closed omonk closed 4 years ago

omonk commented 5 years ago

Description

Due to medium integrating cloudflares anti ddos protection recaptcha the gatsy-source-medium plugin now works sporadically and fails builds.

This is a new feature they've just implemented so nothing that the plugin has done wrong of course! Just something that needs to be worked around

image

Possible way to fix

Cloudscraper I've not come across an issue like this before but a bit of research has led me to cloudscraper which reports that it can bypass the cloudflare screen on a node process.

Happy to take work on a PR to get this integrated but looking for advice/suggestions prior to starting work as this area of getting around cloudflare screen is new to me!

UPDATE upon closer inspection I don't think this will work as the module doesn't bypass reCaptcha, from the issues the author suggests using a paid service which isn't something I think gatsby would want to integrate

Use RSS feed Using a different endpoint https://medium.com/feed/${nameOfBlog} you're able to get an RSS export of the blog/publication, this is simple enough to parse to json and then parse the HTML content but would require a bit more thought to match the current implementation of the plugin. Another issue with this potential fix is that the RSS feed only gives you the latest 10 posts from the blog, rather than the 100 limit the gatsby plugin requests.

Steps to reproduce

Expected result

Build should run without errors

Actual result

Build failed due to 403 requests due to not completing captcha

Environment

Builds fail locally and on Netlify environment.

  System:
    OS: macOS 10.14.6
    CPU: (12) x64 Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
    Shell: 3.0.2 - /usr/local/bin/fish
  Binaries:
    Node: 10.15.1 - /usr/local/bin/node
    Yarn: 1.17.3 - ~/.yarn/bin/yarn
    npm: 5.6.0 - /usr/local/bin/npm
  Languages:
    Python: 2.7.10 - /usr/bin/python
  Browsers:
    Chrome: 76.0.3809.132
    Firefox: 68.0.2
    Safari: 12.1.2
  npmPackages:
    gatsby: ^2.8.3 => 2.14.0 
    gatsby-image: 2.0.20 => 2.0.20 
    gatsby-plugin-env-variables: ^1.0.1 => 1.0.1 
    gatsby-plugin-favicon: 3.1.4 => 3.1.4 
    gatsby-plugin-google-analytics: 2.0.7 => 2.0.7 
    gatsby-plugin-google-tagmanager: ^2.0.6 => 2.1.7 
    gatsby-plugin-modal-routing: ^1.0.0 => 1.0.2 
    gatsby-plugin-netlify: ^2.1.3 => 2.1.10 
    gatsby-plugin-prefetch-google-fonts: ^1.4.2 => 1.4.3 
    gatsby-plugin-react-helmet: 3.0.2 => 3.0.2 
    gatsby-plugin-robots-txt: ^1.5.0 => 1.5.0 
    gatsby-plugin-sitemap: 2.0.2 => 2.0.2 
    gatsby-plugin-styled-components: 3.0.3 => 3.0.3 
    gatsby-source-contentful: 2.1.28 => 2.1.28 
    gatsby-source-filesystem: ^2.0.8 => 2.1.18 
    gatsby-source-lever: 2.0.9 => 2.0.9 
    gatsby-source-medium: 2.0.8 => 2.0.8 
    gatsby-transformer-sharp: ^2.1.8 => 2.2.10 
omonk commented 5 years ago

This has been discussed https://github.com/gatsbyjs/gatsby/issues/16086 previously but closed by the bot 👎

omonk commented 5 years ago

I believe this plugin might have been rendered useless by mediums decision to add this ddos protection. Any help would be great!

grugnog commented 5 years ago

One option would be to switch to using either RSS (assuming that doesn't hit the captcha) or the Medium API instead (which would require an API key configured).

omonk commented 5 years ago

@grugnog - agreed the RSS feed is an option but as detailed in my original post it limits the number of returned posts to max 10, it lacks a few fields that the JSON gives us but does give you the entire post content.

From my understanding the medium API doesn't give you any sort of post content, only the ability to create posts and list a users publications....

I have reached out to medium support to ask for some clarification on this, it seems fairly strange to add a DDoS protection that requires human interaction on to a feed of JSON. Unless they're trying to lock down access to post data, which would be understandable

saintasia commented 5 years ago

Same issue here, hope it'll get fixed soon

wardpeet commented 5 years ago

We're happy to accept an update to gatsby-source-medium to use the medium API

omonk commented 5 years ago

@wardpeet it would be great if that would work but the API doesn't provide us the info we need :(

If medium doesn't remove this cloud flare protection then the RSS feed is the only option we have but even that solution will drastically minimize the plugins features

wardpeet commented 5 years ago

:o what information are you missing? Sorry I haven't read up on the medium api yet.

omonk commented 5 years ago

So the API functionality they provide is:

Whereas the JSON endpoint that this plugin uses returns all posts and their data of a blog or publications apart from the actual content. This worked for the most part as I as I'm sure most users do, just link to the original medium.com post.

The RSS feed gives you the 10 latest posts from a blog/publication, this feed does return all of the post content but say you've got a few hundred blog posts, you're now a bit stuck.

EmaSuriano commented 5 years ago

Hello guys, does anyone has any update of this issue? This is actually breaking my build in one of my open-source projects ...

@omonk What do you think is the best approach to fix this issue?

I would prefer to go for the first one, but this might introduce breaking changes in the plugin because one of the options that it accepts is limit of posts and in this case that will be fixed to only 10 ...

omonk commented 5 years ago

@EmaSuriano - I'm still waiting to hear back from medium about the cloud flare protection thingy. I will reach out again on Monday for a status update but ultimately I think we should prepare to not rely on this plugin as it uses the json feed.

It was breaking our build as well, we werent able to deploy to production or even create previews. What we opted for in the end was to manually copy paste the json into a local file and break apart the plugins code to grab all the bits we needed and then use the new code to upload relavant data to our cms. We're in the middle of a blog migration so we had some stuff in place already but the concept should be fairly straightforward if you have a cms with API write access or you can write to local files. obviously this might not be your best solution if you have a few hundred posts you want to collate.

I don't think rewriting the plugin to use the API will work as it doesn't give us the same data this json feed does, only post publishing.

If you're really stuck and need this data, it is possible to get an export of your blog or publication from medium in HTML or XML. From this export you can get all the data you need, for posts that occur after the export you can then use the RSS feed to grab the 10 latest.

A bit of a ramble, if you need anymore help or have questions let me know and I'll try and assist!

omonk commented 5 years ago

Oh I just saw your reference, we use contentful as well. I can share the script in a gist when I'm home

omonk commented 5 years ago

@EmaSuriano - here you go

Looking over the gatsby-starter-mate starter I realise this is not the best solution, we've not been able to find a different solution though...

biernacki commented 5 years ago

I haven't checked if it's the same payload, it's probably not the most reliable solution, but if you try omitting latest from the endpoint the plugin works again (but again, it's a temp solution)

So url => https://medium.com/${nameOfBlogHere}?format=json&limit=10

EmaSuriano commented 5 years ago

I haven't checked if it's the same payload, it's probably not the most reliable solution, but if you try omitting latest from the endpoint the plugin works again (but again, it's a temp solution)

So url => https://medium.com/${nameOfBlogHere}?format=json&limit=10

The problem I see with this solution is that it's not returning the same amount of results as with the /latest endpoint . For my user, only returns 4 posts ...

wardpeet commented 5 years ago

I would like to know what people think of the workaround? I rather have a real fix. We're happy to publish a major version of this plugin if we need to to do breaking changes to get this working again.

omonk commented 5 years ago

I'm still waiting to hear back from medium, they said they would speak to their platform team.

The work around works but does majorly limited the plugin features. We've been able to work around this by using other methods.

EmaSuriano commented 5 years ago

@omonk any news on this issue?

omonk commented 5 years ago

Unfortunately not, I nudged them again last week but still no reply. I reckon maybe if a few more people raise it then we might hear something.

omonk commented 5 years ago

@EmaSuriano a final nail in the coffin from medium:

Hi Oliver,

Sorry for the delay.

I have spoken with our security team, and unfortunately there is no workaround to our protections. They are crucial to the overall safety and security of Medium.com. 

We apologize for this. 

What to do from here?

The core feature of this plugin is no longer possible due to these changes in the upstream dependency. There are some alternative routes we could take but the end results are majorly limited when compared to the original functionality we used to have available.

I've just found this package that would be a good work around, it does require users to export their blog first.

There are a few options but I think they're out of the scope of this issue, maybe we can start a new issue to discuss new tools.

EmaSuriano commented 5 years ago

@omonk so basically there is no way to retrieve all the medium posts without hacking into Medium? 🤦‍♂

I agree on starting a new issue to discuss different alternatives to tackle this 👍

mfeltscher commented 5 years ago

Hi there!

When we released the first version of this plugin long time ago we were aware that this could happen one day since that endpoint isn't officially documented and supported by Medium.

We looked into this and discussed some potential fixes and came to the conclusion, that it is still possible to achieve the same functionality as the plugin used to offer. The solution consists of using the RSS feed and parse the medium post's source code itself. I'll work on that particular feature and will provide a merge request as soon as possible. Stay tuned 😉

omonk commented 5 years ago

@mfeltscher Sounds good! Although do you know if it is possible page through the the RSS feed? Currently the max it returns is the latest 10 posts?

mfeltscher commented 5 years ago

@omonk After looking into this I have to admit that you're right.. So at the moment I have also have no solution at hand 😞did you already discuss some potential workarounds?

omonk commented 5 years ago

Oh I was hoping you might have found another way!

To unblock us we've taken a local copy of the json after copying it to a local file manually then parsing that json to write all the info we need to our cms. It works but I don't think it's a viable option to release, maybe some documentation about it somewhere.

Ultimately I think this plugin might have to be put to bed :(

mfeltscher commented 5 years ago

I came up with a similar solution - fetching the JSON file manually and using a local plugin: https://github.com/smartive/smartive.ch/commit/fcaf8d588d40a967035d3b8fec7e3a25f4a5f916

LekoArts commented 5 years ago

IMO getting 10 posts is better than nothing. People might want to show their latest posts somewhere. Feel free to discuss the future of the plugin here or in a new issue (but please close this one here then) and if e.g. the JSON solution would be feasible. In the meantime the documentation should probably be updated to make people aware of these issues.

cbetta commented 5 years ago

I agree with @LekoArts, I'd love for this plugin to work even if only in a limited capacity. Right now it doesn't work at all.

EmaSuriano commented 5 years ago

Hello, guys as a quick workaround and in order to use this package the way it's right now you can include this script into your package.json that implements the patch that @biernacki proposed.

"fix-medium-plugin": "sed -i' ' 's/\\/latest//g' ./node_modules/gatsby-source-medium/gatsby-node.js",

It works in MacOSX and Unix 👍

LekoArts commented 5 years ago

We'd highly appreciate if someone could put up a PR implementing the JSON solution and updating the docs to make users aware of these recent changes (+ linking to this issue). Thanks a lot!

brkn commented 5 years ago

Hello 👋 I would like to work on this issue. But I'm a bit confused on which method is decided to be implemented. From what I've understand there is 3 options:

  1. As @biernacki suggested, using this url for request (without the latestparameter): https://medium.com/${nameOfBlogHere}?format=json&limit=10 But as @EmaSuriano said, it returns 4 posts regardless of the limit parameter.
  2. Using rss endpoint. https://medium.com/feed/@${nameOfBlog} This option returns latest 10 posts.
  3. Guiding user to download their posts manually from browser. Then parsing the local json.
cbetta commented 5 years ago

I'd prefer 1 as it has the images and the rss doesn't

sujinjoshi commented 5 years ago

I had implemented #1 and created a pull request #17547

omonk commented 4 years ago

@sujinjoshi - awesome, I'd like to reopen and merge this. Can you update the documentation for the plugin detailing the limit of 10 and a link to this issue, maybe with a TLDR?

marcobiedermann commented 4 years ago

In general: Why not using their official API? https://developers.medium.com/

cbetta commented 4 years ago

Because their API is not self-service actually.

craftzdog commented 4 years ago

I switched it to use rss2json.com which returns thumbnail URLs like a charm. Manually creating nodes based on JSON data in gatsby-node.js. Here is the diff of my work: https://github.com/inkdropapp/inkdrop-top/commit/49af291b07cb0aab7cd9dc8547ac5d75e8e33b3e

sujinjoshi commented 4 years ago

@sujinjoshi - awesome, I'd like to reopen and merge this. Can you update the documentation for the plugin detailing the limit of 10 and a link to this issue, maybe with a TLDR?

I have updated the README as well and edited the PR description to better describe the limitation.

github-actions[bot] commented 4 years ago

Hiya!

This issue has gone quiet. Spooky quiet. 👻

We get a lot of issues, so we currently close issues after 30 days of inactivity. It’s been at least 20 days since the last update here. If we missed this issue or if you want to keep it open, please reply here. You can also add the label "not stale" to keep this issue open! As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks for being a part of the Gatsby community! 💪💜

github-actions[bot] commented 4 years ago

Hey again!

It’s been 30 days since anything happened on this issue, so our friendly neighborhood robot (that’s me!) is going to close it. Please keep in mind that I’m only a robot, so if I’ve closed this issue in error, I’m HUMAN_EMOTION_SORRY. Please feel free to reopen this issue or create a new one if you need anything else. As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks again for being part of the Gatsby community! 💪💜

github-actions[bot] commented 4 years ago

Hiya!

This issue has gone quiet. Spooky quiet. 👻

We get a lot of issues, so we currently close issues after 30 days of inactivity. It’s been at least 20 days since the last update here. If we missed this issue or if you want to keep it open, please reply here. You can also add the label "not stale" to keep this issue open! As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks for being a part of the Gatsby community! 💪💜

sujinjoshi commented 4 years ago

@sujinjoshi - awesome, I'd like to reopen and merge this. Can you update the documentation for the plugin detailing the limit of 10 and a link to this issue, maybe with a TLDR?

I have updated the README as well and edited the PR description to better describe the limitation.

@omonk I have updated the README. Can you please check and merge the PR and close this issue? Thanks!

hackjutsu commented 4 years ago

Hello, will be fix be available in Gatsby 1.x ?

github-actions[bot] commented 4 years ago

Hiya!

This issue has gone quiet. Spooky quiet. 👻

We get a lot of issues, so we currently close issues after 30 days of inactivity. It’s been at least 20 days since the last update here. If we missed this issue or if you want to keep it open, please reply here. You can also add the label "not stale" to keep this issue open! As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks for being a part of the Gatsby community! 💪💜

github-actions[bot] commented 4 years ago

Hey again!

It’s been 30 days since anything happened on this issue, so our friendly neighborhood robot (that’s me!) is going to close it. Please keep in mind that I’m only a robot, so if I’ve closed this issue in error, I’m HUMAN_EMOTION_SORRY. Please feel free to reopen this issue or create a new one if you need anything else. As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks again for being part of the Gatsby community! 💪💜

spikeburton commented 4 years ago

Any update on this? I see this PR has been closed but it doesn't look like the plugin has gotten an update