apify / apify-docs

This project is the home of Apify's documentation.
https://docs.apify.com
Apache License 2.0
23 stars 69 forks source link

Proxies for learning purposes #959

Open honzajavorek opened 2 months ago

honzajavorek commented 2 months ago

We should figure out how to have a proxy which would work for learning purposes and could be used in hands-on tutorials, but which wouldn't be prone to malfunctioning or abuse. Any ideas? 😄

I think this is quite important, as I don't see much value in tutorials which are broken and a reader cannot easily follow the steps to learn. I don't think that figuring out how to have a "didactic" proxy will be easy, but I think it's worth to figure out.

honzajavorek commented 2 months ago

I'll mention @mtrunkat by random, as I'm not really sure with who to discuss this.

mtrunkat commented 2 months ago

Yeah, this is a good point. What we did recently with SDK is skip proxy usage when use is not a paid plan and does not have external access. Once you deploy it to Apify, it will use the proxy.

Let's discuss this when we meet on Tuesday. I really don't see a ~simple~ safe solution at this point.

CC @jirimoravcik as he is our proxy expert just to be aware :)

mnmkng commented 2 months ago

For a long time we did not offer residential proxies on the free plan and proxies in general were limited to 30 days trial. But now we offer both for free up to $5/m. So perhaps this is a nice and honest opportunity to turn learners into signups, because we can offer them free proxies and they will be able to complete the tutorials with them.

But it's true what Mara says, that even though they have access to the proxies on Apify, they can't use them locally, which is a bummer. Showing them how to run the scrapers on Apify instead of locally is a bit of a stretch, I guess.

honzajavorek commented 1 month ago

I think that the free tier could be a satisfactory answer, but as you point out, they can't get hold of a specific address easily and can't use them locally, which is a problem. We can turn them to full Apify platform users later in their journey, but I see it the same way, that a tutorial for basic proxies manipulation and explanation is not the right time and place to do that.

To get hold of a specific address ain't such a problem, we can instruct the student to "run this to get the address", but not being able to use the address from their computer is a big limitation for teaching and learning.

What about a scraper which would find some fresh addresses somewhere on the internet? Even low quality ones, we're not gonna do rocket science with them. I don't plan to send people to scrape LinkedIn (final level boss 👾 🎮 👾) as part of their early learning. I could instruct students to run certain scraper to get hold of a few fresh and mostly working addresses they could use for their learning. They'd also understand why it's so annoying to do that on your own, better understanding the value of the platform 😄

jirimoravcik commented 1 month ago

I think that the free tier could be a satisfactory answer, but as you point out, they can't get hold of a specific address easily and can't use them locally, which is a problem. We can turn them to full Apify platform users later in their journey, but I see it the same way, that a tutorial for basic proxies manipulation and explanation is not the right time and place to do that.

To get hold of a specific address ain't such a problem, we can instruct the student to "run this to get the address", but not being able to use the address from their computer is a big limitation for teaching and learning.

What about a scraper which would find some fresh addresses somewhere on the internet? Even low quality ones, we're not gonna do rocket science with them. I don't plan to send people to scrape LinkedIn (final level boss 👾 🎮 👾) as part of their early learning. I could instruct students to run certain scraper to get hold of a few fresh and mostly working addresses they could use for their learning. They'd also understand why it's so annoying to do that on your own, better understanding the value of the platform 😄

Maybe try the free proxy scraper Actor? https://apify.com/mstephen190/proxy-scraper https://blog.apify.com/automatically-scrape-free-proxy-lists-to-check-for-working-proxies/