Hitting CAPTCHA preventing paid fanclub discovery

Aaeeschylus commented 1 year ago

I thought I should just make a new issue about this as it is a bit different to the previous issue I made (#103) and I can't reopen it.

When running fantiadl_v1.8.3.exe -c cookies.txt -p -t -r -m or fantiadl.py -c cookies.txt -p -t -r -m I get the output of:

Collecting paid fanclubs...
Collected 0 fanclubs.

When printing out the response_page in models.py, it is CAPTCHA that is being returned instead of the expected page with fanclub links. responsePageOutput.txt

Sadly, after waiting a couple weeks, hoping it would resolve itself, it didn't. Weirdly enough as well, I can access the entirety of Fantia on both Chrome and Firefox and have never even seen the CAPTCHA page. It for some reason is only ever being hit by fantiadl. Even if I go to the exact link that gets hit resulting in the CAPTCHA (https://fantia.jp/mypage/users/plans?type=not_free&page={1}) on a browser, I still do not actually get given the CAPTCHA.

Coffeelatte369 commented 1 year ago

I second this, i have the same issue.

bitbybyte commented 1 year ago

It appears to be reCAPTCHA v3, and we can see what this page actually does on demand from https://fantia.jp/recaptcha. I know yt-dlp and others in the past have found a way to take the source, which allows you to paste the URL into a browser, solve the CAPTCHA, and then copy the response hash back to the command line. That might not be necessary depending on what Fantia has configured their CAPTCHA score threshold to be, since in most cases v3 presents no challenge: <a href="https://developers.google.com/recaptcha/docs/v3">https://developers.google.com/recaptcha/docs/v3</a></p> <blockquote> <p>reCAPTCHA v3 returns a score (1.0 is very likely a good interaction, 0.0 is very likely a bot). Based on the score, you can take variable action in the context of your site.</p> </blockquote> <p>As you can see here this page presents a button that kicks off a <code>set_recaptcha_response()</code> call which retrieves a response token from <code>POST https://www.recaptcha.net/recaptcha/api2/reload?k=6LfMBeEUAAAAAM0aMGySYnrhwQAx0tB-9Y1Tu_R1</code>. Basically, submit form <code>#recaptcha_verify</code> (to <a href="https://fantia.jp/recaptcha/verify">https://fantia.jp/recaptcha/verify</a>) after setting the <code>recaptcha_site_key</code> (as seen in the API call) and <code>recaptcha_response</code> from the API response back:</p> <p><code>function set_recaptcha_response(e){window.event.preventDefault(),grecaptcha.ready(function(){var t=document.getElementById("recaptcha_site_key").value;grecaptcha.execute(t,{action:"contact"}).then(function(t){var i=document.getElementById("recaptchaResponse");i.value=t;var n="#"+e;$(n).unbind("submit").submit()})})}</code></p> <p>Their backend I presume then calls out to <code>https://www.recaptcha.net/recaptcha/api/siteverify</code> and determines how to proceed based on their score threshold. For me a <code>POST https://fantia.jp/recaptcha/verify</code> with the requisite form data returns 302 and takes me to the homepage. </p> <p>To be frank, I'm not too keen to handle this at the moment because it only prevents scraping the paid plans page and hasn't affected manual downloading yet, but this is at least probably a path forward. If our verification score isn't up to their standard (too low) we'll probably then have to deal with the <iframe> method I mentioned above. Probably whatever /recaptcha/verify responds with would make that fairly straightforward.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/bitbybyte"><img src="https://avatars.githubusercontent.com/u/538449?v=4" />bitbybyte</a> commented <strong> 1 year ago</strong> </div> <div class="markdown-body"> <p>Another alternative: we can take our followed fanclubs from /api/v1/me/fanclubs, fetch each at /api/v1/fanclubs/<id>, then iterate over all plans and check their status. </p> <p>This should work, but potentially could be very slow with lots of clubs.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/KJHJason"><img src="https://avatars.githubusercontent.com/u/73630142?v=4" />KJHJason</a> commented <strong> 1 year ago</strong> </div> <div class="markdown-body"> <p>Their API has been recently protected by reCAPTCHA as well.</p> <p>API response if flagged as a bot:</p> <pre><code>{ "redirect": "/recaptcha" }</code></pre> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/bitbybyte"><img src="https://avatars.githubusercontent.com/u/538449?v=4" />bitbybyte</a> commented <strong> 1 year ago</strong> </div> <div class="markdown-body"> <p>I was able to encounter this using the same cookie as my browser session. From the browser, I solved the CAPTCHA at /recaptcha and fantiadl was able to proceed after. </p> <p>Rather than dealing with the above, I think what I'll do is implement a prompt to solve the CAPTCHA in a browser using the same session, then prompt the user if it's okay to proceed with something like:</p> <blockquote> <p>You must solve a CAPTCHA to continue. Please solve the CAPTCHA at <a href="https://fantia.jp/recaptcha">https://fantia.jp/recaptcha</a> using the same session you used to retrieve your session cookie value. When done, enter "Y" to continue:</p> </blockquote> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/KJHJason"><img src="https://avatars.githubusercontent.com/u/73630142?v=4" />KJHJason</a> commented <strong> 1 year ago</strong> </div> <div class="markdown-body"> <blockquote> <p>I was able to encounter this using the same cookie as my browser session. From the browser, I solved the CAPTCHA at /recaptcha and fantiadl was able to proceed after.</p> <p>Rather than dealing with the above, I think what I'll do is implement a prompt to solve the CAPTCHA in a browser using the same session, then prompt the user if it's okay to proceed with something like:</p> <blockquote> <p>You must solve a CAPTCHA to continue. Please solve the CAPTCHA at <a href="https://fantia.jp/recaptcha">https://fantia.jp/recaptcha</a> using the same session you used to retrieve your session cookie value. When done, enter "Y" to continue:</p> </blockquote> </blockquote> <p>I believe it's also possible to use selenium to solve the CAPTCHA automatically. At least that's what I did for my golang CLI program using chromedp.</p> <p>You could make an option for the user if they would like to opt in for automatic CAPTCHA solver or manual solve in the event the selenium approach no longer works.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/bitbybyte"><img src="https://avatars.githubusercontent.com/u/538449?v=4" />bitbybyte</a> commented <strong> 1 year ago</strong> </div> <div class="markdown-body"> <p>Using a headless browser to click the button probably works in most cases, but I also don't know how resilient that is since I don't know if Fantia ever throws back an actual CAPTCHA for you to solve. It seems not at all, or rare, so I'll keep it in mind.</p> </div> </div> <div class="page-bar-simple"> </div> <div class="footer"> <ul class="body"> <li>© <script> document.write(new Date().getFullYear()) </script> Githubissues.</li> <li>Githubissues is a development platform for aggregating issues.</li> </ul> </div> <script src="https://cdn.jsdelivr.net/npm/jquery@3.5.1/dist/jquery.min.js"></script> <script src="/githubissues/assets/js.js"></script> <script src="/githubissues/assets/markdown.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/highlight.min.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/languages/go.min.js"></script> <script> hljs.highlightAll(); </script> </body> </html>

bitbybyte / fantiadl

Hitting CAPTCHA preventing paid fanclub discovery #107