Rascalov / Anki-Simple-Forvo-Audio

Simple anki 2.1+ addon to get forvo audio into your cards for free. No Forvo account needed
https://ankiweb.net/shared/info/560814150
GNU General Public License v3.0
33 stars 3 forks source link

A method to scrape all Forvo pronunciations to use the add-on offline #5

Closed ghost closed 2 years ago

ghost commented 2 years ago

A method to scrape all Forvo pronunciations is now available :

https://ankiweb.net/shared/info/560814150#:~:text=11%2F21%2F2021-,a%20method%20to%20scrape,-all%20Forvo%20pronunciations

The scraping method works perfectly. It can scrape absolutely all the audio files for each language.

For example, there are more than 500.000 russian audios to scrape easily.

Would it be possible to download the audios of one's target language and then bulk add into Anki with the add-on ?

Rascalov commented 2 years ago

Good day,

I have not yet tried the script myself, I'm curious how it will prevent forvo from blocking the ip that's bulk downloading all audios of a language. I'll soon have some fresh throwaway IPs to test it on. I'm curious how the creator got a txt dump of all words recorded on forvo, very useful.

For now, I will mark this as an enhancement, this proposal could well solve the issues my original bulk scraper had by having the users give the addon a dictionary file to work with. I just need some time to figure out how this can be done best. Taking a .mdx dictionary file as input seems like my best bet.

ghost commented 2 years ago

The Author of the Script is from China. He is a member of the Telegram group of "FreeMDict":
https://t.me/freemdict

He obtained a list with 5.7 million URLs from Forvo using Python and spent several weeks doing it! He finished the work on August 2021 and shared with me the script.

The original author tried to scrape too quickly all the sounds from Forvo and after querying 1 or 2 million URLs his IP was blocked in China. Then, he asked me to scrape from my IP. I did it slowly (at an speed 400 Kb/s) and succesfully queried all the 5.7 million URLs :1st_place_medal:

Forvo never blocked me. :dancers: :smiley:

On September I obtained 620.000 German Pronunciations from Forvo and made an .mdx dictionary (on FreeMDict - Private post).

Yesterday I run the Python script and is still working perfectly ! I tried Russian, French and English and those languages work OK.

Just follow the instructions on FreeMDict where the script was posted: https://forum.freemdict.com/t/topic/8100 (private post required registration)

Please contact me on FreeMDict Forum. My nickname there is "tovaremeterio" : https://forum.freemdict.com/u/tovaremeterio/

I want to scrape several languages (including Russian). We could split the work to avoid duplication of effort :D

ghost commented 2 years ago

@Rascalov

Someone from https://forum.ru-board.com/ (aleven) is downloading all the Russian pronunciations from Forvo.com. He might finish within 3-4 days.

Please let me know if you are interested in the sounds.

ghost commented 2 years ago

@Rascalov All Forvo Audios are now available to download:

https://forum.freemdict.com/t/topic/11947

You can use the Russian audios for your language learning :D