Open Aaeeschylus opened 1 year ago
I second this, i have the same issue.
It appears to be reCAPTCHA v3, and we can see what this page actually does on demand from https://fantia.jp/recaptcha. I know yt-dlp and others in the past have found a way to take the
reCAPTCHA v3 returns a score (1.0 is very likely a good interaction, 0.0 is very likely a bot). Based on the score, you can take variable action in the context of your site.
As you can see here this page presents a button that kicks off a set_recaptcha_response()
call which retrieves a response token from POST https://www.recaptcha.net/recaptcha/api2/reload?k=6LfMBeEUAAAAAM0aMGySYnrhwQAx0tB-9Y1Tu_R1
. Basically, submit form #recaptcha_verify
(to https://fantia.jp/recaptcha/verify) after setting the recaptcha_site_key
(as seen in the API call) and recaptcha_response
from the API response back:
function set_recaptcha_response(e){window.event.preventDefault(),grecaptcha.ready(function(){var t=document.getElementById("recaptcha_site_key").value;grecaptcha.execute(t,{action:"contact"}).then(function(t){var i=document.getElementById("recaptchaResponse");i.value=t;var n="#"+e;$(n).unbind("submit").submit()})})}
Their backend I presume then calls out to https://www.recaptcha.net/recaptcha/api/siteverify
and determines how to proceed based on their score threshold. For me a POST https://fantia.jp/recaptcha/verify
with the requisite form data returns 302 and takes me to the homepage.
To be frank, I'm not too keen to handle this at the moment because it only prevents scraping the paid plans page and hasn't affected manual downloading yet, but this is at least probably a path forward. If our verification score isn't up to their standard (too low) we'll probably then have to deal with the
Another alternative: we can take our followed fanclubs from /api/v1/me/fanclubs, fetch each at /api/v1/fanclubs/
This should work, but potentially could be very slow with lots of clubs.
Their API has been recently protected by reCAPTCHA as well.
API response if flagged as a bot:
{
"redirect": "/recaptcha"
}
I was able to encounter this using the same cookie as my browser session. From the browser, I solved the CAPTCHA at /recaptcha and fantiadl was able to proceed after.
Rather than dealing with the above, I think what I'll do is implement a prompt to solve the CAPTCHA in a browser using the same session, then prompt the user if it's okay to proceed with something like:
You must solve a CAPTCHA to continue. Please solve the CAPTCHA at https://fantia.jp/recaptcha using the same session you used to retrieve your session cookie value. When done, enter "Y" to continue:
I was able to encounter this using the same cookie as my browser session. From the browser, I solved the CAPTCHA at /recaptcha and fantiadl was able to proceed after.
Rather than dealing with the above, I think what I'll do is implement a prompt to solve the CAPTCHA in a browser using the same session, then prompt the user if it's okay to proceed with something like:
You must solve a CAPTCHA to continue. Please solve the CAPTCHA at https://fantia.jp/recaptcha using the same session you used to retrieve your session cookie value. When done, enter "Y" to continue:
I believe it's also possible to use selenium to solve the CAPTCHA automatically. At least that's what I did for my golang CLI program using chromedp.
You could make an option for the user if they would like to opt in for automatic CAPTCHA solver or manual solve in the event the selenium approach no longer works.
Using a headless browser to click the button probably works in most cases, but I also don't know how resilient that is since I don't know if Fantia ever throws back an actual CAPTCHA for you to solve. It seems not at all, or rare, so I'll keep it in mind.
I thought I should just make a new issue about this as it is a bit different to the previous issue I made (#103) and I can't reopen it.
When running
fantiadl_v1.8.3.exe -c cookies.txt -p -t -r -m
orfantiadl.py -c cookies.txt -p -t -r -m
I get the output of:When printing out the
response_page
inmodels.py
, it is CAPTCHA that is being returned instead of the expected page with fanclub links. responsePageOutput.txtSadly, after waiting a couple weeks, hoping it would resolve itself, it didn't. Weirdly enough as well, I can access the entirety of Fantia on both Chrome and Firefox and have never even seen the CAPTCHA page. It for some reason is only ever being hit by fantiadl. Even if I go to the exact link that gets hit resulting in the CAPTCHA (
https://fantia.jp/mypage/users/plans?type=not_free&page={1}
) on a browser, I still do not actually get given the CAPTCHA.