acoli-repo / acoli-corpora

open source corpora created, annotated or maintained by the ACoLi group at University of Augsburg, Germany.
8 stars 1 forks source link

more parallel corpus data #9

Open chiarcos opened 1 month ago

chiarcos commented 1 month ago
chiarcos commented 1 month ago

Not in repo, yet: bilinguis crawling script

    wget -r -np -nc -k -l 0 -F \
    -U "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" \
            --reject '*.js,*.css,*.ico,*.txt,*.gif,*.jpg,*.jpeg,*.png,*.mp3,*.pdf,*.tgz,*.flv,*.avi,*.mpeg,*.iso' \
             http://bilinguis.com