PayasR / paralite

0 stars 0 forks source link

"output_record_delimite" のデフォルト値 #35

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
● ステータス:修正済み

● 現象
 ・マニュアルの「2.3 Performing collective query」、「User-Defined 
Executable (UDX)」の定義に
 「output_record_delimiter : EMPTY_LINE by default or any 
string」とあり、「output_record_delimite」の
 デフォルト値が 「EMPTY_LINE('\n\n')」となっている。
 ・ところが、'output_record_delimite 
EMPTY_LINE'を使ったクエリ(下記 Ex.1)と、'output_record_delimite 
EMPTY_LINE'
 を使わなかったクエリ(下記 
Ex.2)を実行すると、出力結果が異る。

Ex.1) 'output_record_delimite'指定あり
$ paralite test.db "select document_id, F(text) from document with F='perl ss' 
output_col_delimiter '==' output_record_delimiter EMPTY_LINE"

Ex.2) 'output_record_delimite'指定なし
$ paralite test.db "select document_id, F(text) from document with F='perl ss' 
output_col_delimiter '=='"

● 再現手順
Ex.3) まず、環境を整える
$ paralite test.db < a.sql

$ cat a.sql
↓
create table document(document_id, text);
.import doc.dat document;

$ cat doc.dat
↓
32819|It is sunny today. I have a good mood.
82718|I am studying in the lab. I want to go outside and play pingpong.

$ cat ss
↓
#!/usr/bin/perl
while (my $l = <STDIN>) {
  chomp($l);
  my @s = split(/\.\s*/, $l);
  my $i = 1;
  foreach my $item (@s) {
    print $i, '==', $item, ".\n";
    ++$i;
  }
  print "\n";
}

Ex.4) 'output_record_delimite'指定あり
$ paralite test.db "select document_id, F(text) from document with F='perl ss' 
output_col_delimiter '==' output_record_delimiter EMPTY_LINE"
↓
82718|1|I am studying in the lab.
82718|2|I want to go outside and play pingpong.
32819|1|It is sunny today.
32819|2|I have a good mood.

Ex.5) 'output_record_delimite'指定なし
$ paralite test.db "select document_id, F(text) from document with F='perl ss' 
output_col_delimiter '=='"
↓
82718|1|I am studying in the lab.
32819|1|It is sunny today.

What is the expected output? What do you see instead?
Ex.6) 'output_record_delimite'指定あり
$ paralite test.db "select document_id, F(text) from document with F='perl ss' 
output_col_delimiter '==' output_record_delimiter EMPTY_LINE"
↓
82718|1|I am studying in the lab.
82718|2|I want to go outside and play pingpong.
32819|1|It is sunny today.
32819|2|I have a good mood.

Ex.7) 'output_record_delimite'指定なし
$ paralite test.db "select document_id, F(text) from document with F='perl ss' 
output_col_delimiter '=='"
↓
82718|1|I am studying in the lab.
82718|2|I want to go outside and play pingpong.
32819|1|It is sunny today.
32819|2|I have a good mood.

Ex.8) 'output_record_delimite' NULL指定あり
$ paralite test.db "select document_id, F(text) from document with F='perl ss' 
output_col_delimiter '==' output_record_delimiter NULL"
↓
82718|1|I am studying in the lab.
32819|1|It is sunny today.

Please use labels and text to provide additional information.
● 原因
 ・m_paraLite.pyファイル 308行目「self.output_record_delimiter = 
conf.NULL」で、「output_record_delimite」のデフォルト値が、NULL(
空文字"")となっている。
 同クラス(class 
UDXDef:)内の、他のデリミタのデフォルト値はマニュアル通り
。
 
 ・ 308行目「self.output_record_delimiter = conf.NULL」を 
EMPTY_LINE('\n\n')と修正する。
 
 ・上記(Ex.5)の結果を得たい場合には、'output_record_delimite 
NULL'と指定すれば、上記(Ex.8)のように結果を得られる。
 「NULL」指定できるように、newparser.pyファイルのudx定義のO
UTPUT_RECORD_DELIMITERに「NULL」を追加修正した。

● 修正箇所
 ◆ bin/m_paraLite.py / class UDXDef: / def __init__(self):
< before >
        self.input_record_delimiter = conf.NULL
        self.output_record_delimiter = conf.NULL

< after >
        self.input_record_delimiter = conf.NULL
        # takizawa:Issue35: 下記の1行を修正。Null(空文字)からEMPTY_LINE(改行改行)へ修正。
        # self.output_record_delimiter = conf.NULL
        self.output_record_delimiter = conf.EMPTY_LINE

 ◆ lib/newparser.py
< before >
# takizawa:Issue26: 
下記1行を修正。command_lineでシングルクォートを使えるよう�
��、デリミタが順不同になるように修正。
# udx = exe_name("exe_name") + EQUAL + DQUOTE + command_line + DQUOTE + 
Optional(INPUT + (STDIN | (SQUOTE+ file_name + SQUOTE)))("input") + 
Optional(INPUT_ROW_DELIMITER + ((SQUOTE + delimiter + SQUOTE) | EMPTY_LINE | 
NEW_LINE )) + Optional(INPUT_COL_DELIMITER + (NULL | SPACE| (SQUOTE + delimiter 
+ SQUOTE))) + Optional(INPUT_RECORD_DELIMITER + (NEW_LINE|EMPTY_LINE | (SQUOTE 
+ delimiter + SQUOTE)))+ Optional(OUTPUT + (STDOUT | (SQUOTE+file_name + 
SQUOTE))) + Optional(OUTPUT_ROW_DELIMITER + (NEW_LINE | EMPTY_LINE | (SQUOTE 
+delimiter+ SQUOTE))) + Optional(OUTPUT_COL_DELIMITER + (NULL | SPACE | (SQUOTE 
+ delimiter + SQUOTE))) + Optional(OUTPUT_RECORD_DELIMITER + (EMPTY_LINE | 
(SQUOTE + delimiter + SQUOTE))) 
udx = exe_name("exe_name") + EQUAL + ((DQUOTE + command_line + DQUOTE) | 
(SQUOTE + command_line + SQUOTE)) + ZeroOrMore((INPUT + (STDIN | SQUOTE + 
file_name + SQUOTE)) | (INPUT_ROW_DELIMITER + (SQUOTE + delimiter + SQUOTE | 
EMPTY_LINE | NEW_LINE )) | (INPUT_COL_DELIMITER + (NULL | SPACE | SQUOTE + 
delimiter + SQUOTE)) | (INPUT_RECORD_DELIMITER + (NEW_LINE | EMPTY_LINE | 
SQUOTE + delimiter + SQUOTE)) | (OUTPUT + (STDOUT | SQUOTE + file_name + 
SQUOTE)) | (OUTPUT_ROW_DELIMITER + (NEW_LINE | EMPTY_LINE | SQUOTE + delimiter 
+ SQUOTE)) | (OUTPUT_COL_DELIMITER + (NULL | SPACE | SQUOTE + delimiter + 
SQUOTE)) | (OUTPUT_RECORD_DELIMITER + (EMPTY_LINE | (SQUOTE + delimiter + 
SQUOTE)))

< after >
# takizawa:Issue26&35: 
下記1行を修正。command_lineでシングルクォートを使えるよう�
��、デリミタが順不同になるように修正。
# udx = exe_name("exe_name") + EQUAL + DQUOTE + command_line + DQUOTE + 
Optional(INPUT + (STDIN | (SQUOTE+ file_name + SQUOTE)))("input") + 
Optional(INPUT_ROW_DELIMITER + ((SQUOTE + delimiter + SQUOTE) | EMPTY_LINE | 
NEW_LINE )) + Optional(INPUT_COL_DELIMITER + (NULL | SPACE| (SQUOTE + delimiter 
+ SQUOTE))) + Optional(INPUT_RECORD_DELIMITER + (NEW_LINE|EMPTY_LINE | (SQUOTE 
+ delimiter + SQUOTE)))+ Optional(OUTPUT + (STDOUT | (SQUOTE+file_name + 
SQUOTE))) + Optional(OUTPUT_ROW_DELIMITER + (NEW_LINE | EMPTY_LINE | (SQUOTE 
+delimiter+ SQUOTE))) + Optional(OUTPUT_COL_DELIMITER + (NULL | SPACE | (SQUOTE 
+ delimiter + SQUOTE))) + Optional(OUTPUT_RECORD_DELIMITER + (EMPTY_LINE | 
(SQUOTE + delimiter + SQUOTE))) 
udx = exe_name("exe_name") + EQUAL + ((DQUOTE + command_line + DQUOTE) | 
(SQUOTE + command_line + SQUOTE)) + ZeroOrMore((INPUT + (STDIN | SQUOTE + 
file_name + SQUOTE)) | (INPUT_ROW_DELIMITER + (SQUOTE + delimiter + SQUOTE | 
EMPTY_LINE | NEW_LINE )) | (INPUT_COL_DELIMITER + (NULL | SPACE | SQUOTE + 
delimiter + SQUOTE)) | (INPUT_RECORD_DELIMITER + (NEW_LINE | EMPTY_LINE | 
SQUOTE + delimiter + SQUOTE)) | (OUTPUT + (STDOUT | SQUOTE + file_name + 
SQUOTE)) | (OUTPUT_ROW_DELIMITER + (NEW_LINE | EMPTY_LINE | SQUOTE + delimiter 
+ SQUOTE)) | (OUTPUT_COL_DELIMITER + (NULL | SPACE | SQUOTE + delimiter + 
SQUOTE)) | (OUTPUT_RECORD_DELIMITER + (NULL | EMPTY_LINE | SQUOTE + delimiter + 
SQUOTE)))

Original issue reported on code.google.com by wdb.taki...@gmail.com on 24 Dec 2014 at 2:56