FreeLanguageTools / vocabsieve

Simple sentence mining tool for language learning
GNU General Public License v3.0
374 stars 29 forks source link

Forvo cloud stopped working 10 hours ago #146

Open voothi opened 6 months ago

voothi commented 6 months ago

Describe the bug Hello! Forvo cloud stopped working 10 hours ago. I tried version 0.11.1 and 0.12.0. I also tried through different internet connections.

To Reproduce Steps to reproduce the behavior: In GUI VocabSieve 0.11.1 / 0.12.0 under Windows 11. In mode General / Target language = German I connect Forvo to Configure / Sources / Pronunciation sources / Enabled pronunciation sources / Forvo. Checking the settings Lemmatization policy for pronunciation = Try lemma first, otherwise original I make Lookup words from the captured sentence. The word is not reproduced. The forvo/... file below is not displayed. Only local audio files from local audio libraries are displayed.

Expected behavior Audio file for selected German word:

Screenshots

image image

Logs

Desktop (please complete the following information):

Additional context Telegram thread: Hello! Forvo stopped working 10 hours ago. I tried version 0.11.1 and 0.12.0.

1over137 commented 6 months ago

I assume they stepped up their anti scraping measures with Cloudflare, as reported elsewhere with similar projects:

https://github.com/Rascalov/Anki-Simple-Forvo-Audio/issues/31

https://github.com/Rascalov/Anki-Simple-Forvo-Audio/issues/29

In that case there is nothing I can do. Stick to downloaded stuff, probably. If this persists I'll remove the implementation. Maybe I'll look into fetching audio from wiktionary. I'm personally unwilling to do this, but if someone is willing to make a fancy proxy service somewhere like lingva, please let me know.

Rascalov commented 6 months ago

I assume they stepped up their anti scraping measures with Cloudflare, as reported elsewhere with similar projects:

It's an oddball. My extension seemed to work again after I visited the site once on my actual browser to do the Cloudflare Captcha. My guess is that it will whitelist IPs from the check once it's been gone through at least once. But then again, I have yet to hear from people if that actually worked for them as well.

otomiruu commented 6 months ago

Я предполагаю, что они усилили свои меры по борьбе с парсингом с помощью Cloudflare, как сообщалось в других источниках о подобных проектах:

Это чудак. Мое расширение, похоже, снова заработало после того, как я однажды посетил сайт в своем браузере, чтобы ввести Cloudflare Captcha. Я предполагаю, что он внесет IP-адреса в белый список из проверки, как только она будет проверена хотя бы один раз. Но опять же, я еще не слышал от людей, сработало ли это и для них.

Сейчас зашел в Anki, и все заработало, на сайте Forvo не заходил, но если появится данная проблема еще раз, сделаю, как вы сказали. Еще раз спасибо за этот прекрасный аддон

Rascalov commented 6 months ago

Сейчас зашел в Anki, и все заработало, на сайте Forvo не заходил, но если появится данная проблема еще раз, сделаю, как вы сказали. Еще раз спасибо за этот прекрасный аддон

No problem! But I assume you wanted to send this reply in your own created issue 😅

1over137 commented 6 months ago

From what I see, they are still under Cloudflare reverse proxy. The DNS points to a cloudflare IP: http://104.20.253.20/. I assume they are testing this. Most likely they'll just block it again permanently, and then there won't be good solutions other than a full browser scraping script.

Rascalov commented 6 months ago

yeah, there are people (freemdict) that were already posting full site scrapes of audios. ~Can't say how recent these are though~ It's still being updated.