TsingJyujing / sg2fcitx

Automatically exported from code.google.com/p/sg2fcitx
0 stars 0 forks source link

一个批量更新脚本 #6

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
不太习惯于手动更新,于是写了一个脚本进行自动更新:
(共两个文件,分别是fcitxupdate.sh和/etc/fcitxupdate.conf)
#!/bin/bash                                                    
# FILE: fcitxupdate.sh

#WorkingDir="/tmp/fcitxupdate"
WorkingDir="."                

. /etc/fcitxupdate.conf

# Prepare 
test -d $WorkingDir || mkdir $WorkingDir
cd $WorkingDir                          
touch fcitx.phrase                      
test -e gbkpy.org && rm gbkpy.org       
ln -s /usr/share/fcitx/data/gbkpy.org . 

# Get the Sougou Phrases
flag=0                  
touch sg.phrase sg.all  
for sg in "${SougouPhrases[@]}" ; do
    ((flag++))                      
    test $flag = 1 && info="$sg" && continue
    echo "Downloading $info ..."            
    wget -O sg.phrase $sg &> /dev/null      
    # Delete the char which cause segment fault
    sed -e '/[a-zA-Z]/d' sg.phrase >> sg.all   
    flag=0                                     
done                                           
echo "Converting  Sougou phrases ..."          
sg2fcitx sg.all |sort -u >> fcitx.phrase       
sg2fcitx fcitx.phrase >> fcitx.phrase          
rm sg.phrase sg.all                            

# Get the Open-Phrase
flag=0               
touch op.phrase.bz2  
for op in "${OpenPhrases[@]}" ; do
    ((flag++))                    
    test $flag = 1 && info="$op" && continue
    echo -en "Downloading $info ...\r"      
    wget -O op.phrase.bz2 $op &> /dev/null  
    echo     "Converting  $info ...        "
    bzcat op.phrase.bz2 | sort +2 -3 -r -g | awk '{print $2 " " $1}'\
        | awk '{print $1 " " $2}' | iconv -f utf8 -t gbk >> fcitx.phrase

done
rm op.phrase.bz2

# Remove the duplicate words

#echo "Striping duplicate phrases"
#sort -u fcitx.phrase > pyPhrase.org
cp fcitx.phrase pyPhrase.org
rm fcitx.phrase

# Build the new 'mb' file
createPYMB gbkpy.org pyPhrase.org

# Install the mb file
cp *.mb /usr/share/fcitx/data/

# Clean up
rm pyPhrase.org gbkpy.org py*
echo "done."

# end
#------------------------------------------------------------------

# FILE: /etc/fcitxupdate.conf
# Configuration for the fcitx phrase update    

SougouPhrases=(
    # 第一行是说明
    # 第二行是地址
    # 部分词库进行转换的时候会出现‘段错误’,因为其词组含有字母。
    '唐诗300首'                                                 
    "http://pinyin.sogou.com/dict/download_txt.php?id=1"        
    '古诗词名句'                                                
    'http://pinyin.sogou.com/dict/download_txt.php?id=2'        
    #'宋词精选' #内含字母,故出现段错误
    #'http://pinyin.sogou.com/dict/download_txt.php?id=3'
    '网络流行新词'
    'http://pinyin.sogou.com/dict/download_txt.php?id=4'
    '流行新歌top180'
    'http://pinyin.sogou.com/dict/download_txt.php?id=5'
    '历史名人大全'
    'http://pinyin.sogou.com/dict/download_txt.php?id=154'
    '计算机名词'
    'http://pinyin.sogou.com/dict/download_txt.php?id=151'
    '成语俗语大全'
    'http://pinyin.sogou.com/dict/download_txt.php?id=332'
    '台湾ptt bbs 常用语'
    'http://pinyin.sogou.com/dict/download_txt.php?id=9182'
)

OpenPhrases=(
    # 2008年5月版
    #'http://open-
    'Open Phrase 2009年4月版'
    'http://open-
)

# end
#------------------------------------------------

Original issue reported on code.google.com by TaleB...@gmail.com on 12 Jul 2009 at 1:00

GoogleCodeExporter commented 8 years ago
sg2fcitx fcitx.phrase >> fcitx.phrase    似乎无意义?

另外createPYMB gbkpy.org pyPhrase.org 之前应当添一句

sed -i /^[a-z\.\']*$/d  pyPhrase.org 否则必定出错.

Original comment by nankai.w...@gmail.com on 30 Aug 2010 at 6:21