Closed ghost closed 2 years ago
Good day,
I have not yet tried the script myself, I'm curious how it will prevent forvo from blocking the ip that's bulk downloading all audios of a language. I'll soon have some fresh throwaway IPs to test it on. I'm curious how the creator got a txt dump of all words recorded on forvo, very useful.
For now, I will mark this as an enhancement, this proposal could well solve the issues my original bulk scraper had by having the users give the addon a dictionary file to work with. I just need some time to figure out how this can be done best. Taking a .mdx dictionary file as input seems like my best bet.
The Author of the Script is from China. He is a member of the Telegram group of "FreeMDict":
https://t.me/freemdict
He obtained a list with 5.7 million URLs from Forvo using Python and spent several weeks doing it! He finished the work on August 2021 and shared with me the script.
The original author tried to scrape too quickly all the sounds from Forvo and after querying 1 or 2 million URLs his IP was blocked in China. Then, he asked me to scrape from my IP. I did it slowly (at an speed 400 Kb/s) and succesfully queried all the 5.7 million URLs :1st_place_medal:
Forvo never blocked me. :dancers: :smiley:
On September I obtained 620.000 German Pronunciations from Forvo and made an .mdx dictionary (on FreeMDict - Private post).
Yesterday I run the Python script and is still working perfectly ! I tried Russian, French and English and those languages work OK.
Please contact me on FreeMDict Forum. My nickname there is "tovaremeterio" : https://forum.freemdict.com/u/tovaremeterio/
I want to scrape several languages (including Russian). We could split the work to avoid duplication of effort :D
@Rascalov
Someone from https://forum.ru-board.com/ (aleven) is downloading all the Russian pronunciations from Forvo.com. He might finish within 3-4 days.
Please let me know if you are interested in the sounds.
@Rascalov All Forvo Audios are now available to download:
https://forum.freemdict.com/t/topic/11947
You can use the Russian audios for your language learning :D
A method to scrape all Forvo pronunciations is now available :
https://ankiweb.net/shared/info/560814150#:~:text=11%2F21%2F2021-,a%20method%20to%20scrape,-all%20Forvo%20pronunciations
The scraping method works perfectly. It can scrape absolutely all the audio files for each language.
For example, there are more than 500.000 russian audios to scrape easily.
Would it be possible to download the audios of one's target language and then bulk add into Anki with the add-on ?