Kaperstone / mogilefs

Automatically exported from code.google.com/p/mogilefs
0 stars 0 forks source link

Enhancement: delete multiple keys with one call #54

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
This is a lot more efficient then making multiple calls to the tracker for each 
file you want to delete.
I've attached a patch that does this. Has been working well for me.

Original issue reported on code.google.com by ksa...@gmail.com on 16 Jan 2012 at 5:56

GoogleCodeExporter commented 8 years ago
just a heads up that the patch you attached is susceptible to an SQL injection 
attack here:

    my $in = join("','", @$keys);
    my $sth = $self->dbh->prepare("SELECT fid FROM file WHERE dkey IN ('$in')");
    $sth->execute;

you should never use input directly in SQL queries, instead you should use 
bound parameters instead like this:

my $sth = $self->dbh->prepare("SELECT fid FROM file WHERE dkey IN (" . 
join(',', (('?') x scalar @$keys)) . ")");
$sth->execute(@$keys);

this chunk of code: (('?') x scalar @$keys) generates an array of '?' strings 
that is the same size as @$keys

Original comment by daniel.frett on 17 Feb 2012 at 4:05

GoogleCodeExporter commented 8 years ago
Good point. I've attached is a newer version of the patch with that suggestion 
implemented and also uses the already existing enqueue_fids_to_delete2 instead 
of enqueue_for_delete2_multiple which was supplied by the patch.

Original comment by ksa...@gmail.com on 17 Feb 2012 at 9:17

GoogleCodeExporter commented 8 years ago
Looks better. At a glance, I think it could use some hardcoded (or configured 
fetched via server_settings_cached) limit on how many keys you can request for 
delete at once. Just so people can't shoot themselves in the foot.

Original comment by dorma...@rydia.net on 23 Feb 2012 at 8:42

GoogleCodeExporter commented 8 years ago
good call, here is a new patch against 2.59 that uses server_setting_cached 
with a default of 1000 when not set.

Original comment by ksa...@gmail.com on 9 Mar 2012 at 12:15

Attachments:

GoogleCodeExporter commented 8 years ago
Moving this to "accepted" - hope to pull it in either this cycle or next.

Original comment by dorma...@rydia.net on 20 Jun 2012 at 12:56

GoogleCodeExporter commented 8 years ago
I found a bug in my code where I'm passing the file ids to 
enqueue_fids_to_delete2.
An array referenced is passed when an array should be.

in Store.pm, delete_keys_multiple
$self->enqueue_fids_to_delete2(\@fidids);
should be:
$self->enqueue_fids_to_delete2(@fidids);

Original comment by ksa...@gmail.com on 24 Jul 2012 at 6:16

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
I found a bug in the original implementation of this that caused orphaned 
entries in the file table when there was deadlock on file_to_delete2 because 
that part was retried but it rolled back the delete from file. I've changed 
delete_keys_multiple to retry all commands on deadlock instead of just the 
insert into file_to_delete2. I've verified the code below fixes the issue when 
there is deadlock.

sub delete_keys_multiple {
    my ($self, $keys, $limit) = @_;
    my $sth = $self->dbh->prepare("SELECT fid,dkey FROM file WHERE dkey IN (" . join(',', (('?') x scalar @$keys)) . ") LIMIT ?");
    $sth->execute(@$keys, $limit);
    my (@fidids, @row, @deleted);
    while(my @row = $sth->fetchrow_array()){
        push @fidids, $row[0];
        push @deleted, $row[1];
    }
    my $in = join(",", @fidids);
    $self->dbh->begin_work();
    my $nexttry = $self->unix_timestamp;
    $self->retry_on_deadlock(sub {
        $self->dbh->do("DELETE FROM file WHERE fid IN ($in)");
        $self->dbh->do("DELETE FROM tempfile WHERE fid IN ($in)");
        $self->dbh->do($self->ignore_replace . " INTO file_to_delete2 (fid,
        nexttry) VALUES " .
                       join(",", map { "(" . int($_) . ", $nexttry)" } @fidids));
    });
    $self->condthrow;
    $self->dbh->commit();
    return \@deleted;
}

Original comment by ksa...@gmail.com on 4 Jan 2013 at 7:06

GoogleCodeExporter commented 8 years ago
Hi

Comment from Eric on the ML:

"This seems to ignore dmid when deleting keys, so it can inadvertantly
delete keys in a different domain than the one specified."

Can this be fixed?

Thanks!

Original comment by dorma...@rydia.net on 3 Feb 2013 at 12:19

GoogleCodeExporter commented 8 years ago
Good point.

Here is a new function for Store.pm

sub delete_keys_multiple {
    my ($self, $dmid, $keys, $limit) = @_;
    my $sth = $self->dbh->prepare("SELECT fid,dkey FROM file WHERE dmid=? AND dkey IN (" . join(',', (('?') x scalar @$keys)) . ") LIMIT ?");
    $sth->execute($dmid, @$keys, $limit);
    my (@fidids, @row, @deleted);
    while(my @row = $sth->fetchrow_array()){
        push @fidids, $row[0];
        push @deleted, $row[1];
    }
    my $in = join(",", @fidids);
    $self->dbh->trace(2, '/tmp/dbi_trace.out');
    $self->dbh->begin_work();
    my $nexttry = $self->unix_timestamp;
    $self->retry_on_deadlock(sub {
        $self->dbh->do("DELETE FROM file WHERE fid IN ($in)");
        $self->dbh->do("DELETE FROM tempfile WHERE fid IN ($in)");
        $self->dbh->do($self->ignore_replace . " INTO file_to_delete2 (fid,
        nexttry) VALUES " .
                       join(",", map { "(" . int($_) . ", $nexttry)" } @fidids));
    });
    $self->condthrow;
    $self->dbh->commit();
    $self->dbh->trace(0);
    return \@deleted;
}

Here is a new function for Worker/Query.pm

sub cmd_delete_multiple {
    my MogileFS::Worker::Query $self = shift;
    my $args = shift;

    # validate domain for plugins
    $args->{dmid} = $self->check_domain($args)
        or return $self->err_line('domain_not_found');

    # now invoke the plugin, abort if it tells us to
    my $rv = MogileFS::run_global_hook('cmd_delete_multiple', $args);
    return $self->err_line('plugin_aborted')
        if defined $rv && ! $rv;

    my $limit = $args->{limit};
    my $max_limit = MogileFS::Config->server_setting_cached('delete_multiple_limit') || 1000;

    $limit ||= $max_limit;
    $limit += 0;
    $limit = $max_limit if $limit > $max_limit;

    # validate parameters
    my $dmid = $args->{dmid};
    my @keys;
    my $i = 0;
    for my $arg ( keys %$args ){
        if ( $arg =~ /^key\d/ ) {
            $keys[$i] = $args->{$arg};
            $i++;
        }
    }
    debug("delete_mutiple: $limit: @keys");
    my $deleted = Mgd::get_store()->delete_keys_multiple($args->{dmid}, \@keys, $limit);
    my $ret = { key_count => 0 };
    foreach my $del_key (@$deleted){
        $ret->{key_count}++;
        $ret->{"key$ret->{key_count}"} = $del_key;
    }
    return $self->ok_line($ret);
}

Original comment by ksa...@gmail.com on 14 May 2013 at 8:28