elcamlost / perl-DBD-ClickHouse

Other
13 stars 7 forks source link

Problem when selecting columns from a database #3

Closed msestak closed 8 years ago

msestak commented 8 years ago

I select from a small table (less than 10_000) rows.

my $rows = $ch->select("SELECT ps, ti, species_name, gene_hits_per_species FROM   $param_href->{database}.$report_gene_hits_per_species_tbl");
foreach my $row (@$rows) {
    p $row;
}

some rows are correct (4 columns) but some are broken into two parts (at random) so instead of 4 columns I get 3 + 2 (and so on).

GOOD
[
    [0] 1,
    [1] 1961,
    [2] "Streptomyces_virginiae",
    [3] 1
]

BAD part1
\ [
    [0] 1,
    [1] 1968,
    [2] "Str"
]
BAD part2
\ [
    [0] "eptomyces_cellulosae",
    [1] 1
]
elcamlost commented 8 years ago

Looks like you have unquoted tab symbol in your species_name. I can't reproduce error with this piece of code

use ClickHouse;

my $create_table_st = q{
    CREATE TABLE species_tbl (
        ps Int8,
        ti Int16,
        species_name String,
        gene_hits_per_species UInt32
    ) ENGINE = Memory
};

my $insert_st = q{
    INSERT INTO species_tbl (ps,ti,species_name,gene_hits_per_species) VALUES
};

my @values = (
    [1, 1961, "Streptomyces_virginiae",    1],
    [1, 1961, "Streptomyces_cellulosae",    1],
    [1, 1968, "Str\teptomyces_cellulosae", 1],
);

my $ch = ClickHouse->new(
    'host' => HOST(),
);

is ($ch->ping(), 1, "ping is ok");
$ch->do($create_table_st);
$ch->do($insert_st, @values);

my $rows = $ch->select("SELECT * FROM species_tbl");
use Data::Dumper; warn Dumper $rows;

$ch->do("DROP TABLE species_tbl");

Result is

$VAR1 = [
          [
            '1',
            '1961',
            'Streptomyces_virginiae',
            '1'
          ],
          [
            '1',
            '1961',
            'Streptomyces_cellulosae',
            '1'
          ],
          [
            '1',
            '1968',
            'Str\\teptomyces_cellulosae',
            '1'
          ]
        ];

So i do not see bugs here and close issue. But if You can provide table dump, which I can load to ClickHouse and reproduce bug - do it and i'll investigate further.

msestak commented 8 years ago

I can't find unquoted tabs in my file so I'm sending you the table I'm having problems with. SELECT inside clickhouse-client works fine:

SELECT * FROM hs_1mil_report_per_species WHERE species_name LIKE 'Streptomyces_cellulosae%' ORDER BY species_name;

┌─ps─┬───ti─┬─species_name────────────┬─gene_hits_per_species─┬───────date─┐
│  1 │ 1968 │ Streptomyces_cellulosae │                     1 │ 2016-08-31 │
└────┴──────┴─────────────────────────┴───────────────────────┴────────────┘

1 rows in set

species_table_sql.txt species_table_tsv.txt

P.S. You forgot to release version 0.002 to CPAN :).

msestak commented 8 years ago

Check my pull request to resolve this #4 Test script is here (it doesn't drop table). It is quite large because problem doesn't manifest itself on small selects.

test_clickhouse_select_pl.txt