BrianGallew / cassandra_range_repair

python script to repair the primary range of a node in N discrete steps
MIT License
109 stars 47 forks source link

Multiprocess = processing same thing x times ? #14

Closed arodrime closed 10 years ago

arodrime commented 10 years ago

Hi,

First of all this project looks interesting. Are you keeping it up to date / still using it ? Maybe can I help somehow ?

I am using a 3 nodes cluster with C*1.2.18, vnodes activated, and RandomPartitioner.

I have this at debug level:

2014-10-27 15:42:45,754 - DEBUG - 2/256 step 0028 complete 2014-10-27 15:42:45,754 - DEBUG - 2/256 step 0028 complete 2014-10-27 15:42:45,754 - DEBUG - 2/256 step 0028 complete 2014-10-27 15:42:45,754 - DEBUG - 2/256 step 0028 complete 2014-10-27 15:42:45,754 - DEBUG - 2/256 step 0028 complete 2014-10-27 15:42:45,754 - DEBUG - 2/256 step 0028 complete 2014-10-27 15:42:45,754 - DEBUG - 2/256 step 0028 complete 2014-10-27 15:42:45,754 - DEBUG - 2/256 step 0028 complete 2014-10-27 15:42:45,754 - DEBUG - 2/256 step 0028 complete 2014-10-27 15:42:45,756 - DEBUG - 2/256 step 0044 repairing range (001022121847518509771107014664873822543, 001024418238393207180729723314571303210) for keyspace myks. 2014-10-27 15:42:45,756 - DEBUG - 2/256 step 0044 repairing range (001022121847518509771107014664873822543, 001024418238393207180729723314571303210) for keyspace myks. 2014-10-27 15:42:45,756 - DEBUG - 2/256 step 0044 repairing range (001022121847518509771107014664873822543, 001024418238393207180729723314571303210) for keyspace myks. 2014-10-27 15:42:45,756 - DEBUG - 2/256 step 0044 repairing range (001022121847518509771107014664873822543, 001024418238393207180729723314571303210) for keyspace myks. 2014-10-27 15:42:45,756 - DEBUG - 2/256 step 0044 repairing range (001022121847518509771107014664873822543, 001024418238393207180729723314571303210) for keyspace myks. 2014-10-27 15:42:45,756 - DEBUG - 2/256 step 0044 repairing range (001022121847518509771107014664873822543, 001024418238393207180729723314571303210) for keyspace myks. 2014-10-27 15:42:45,756 - DEBUG - 2/256 step 0044 repairing range (001022121847518509771107014664873822543, 001024418238393207180729723314571303210) for keyspace myks. 2014-10-27 15:42:45,756 - DEBUG - 2/256 step 0044 repairing range (001022121847518509771107014664873822543, 001024418238393207180729723314571303210) for keyspace myks. 2014-10-27 15:42:45,756 - DEBUG - 2/256 step 0044 repairing range (001022121847518509771107014664873822543, 001024418238393207180729723314571303210) for keyspace myks. 2014-10-27 15:42:45,756 - DEBUG - 2/256 step 0044 repairing range (001022121847518509771107014664873822543, 001024418238393207180729723314571303210) for keyspace myks. 2014-10-27 15:42:45,757 - DEBUG - run_command: nodetool -h 10.55.83.44 repair myks -pr -st 001022121847518509771107014664873822543 -et 001024418238393207180729723314571303210 2014-10-27 15:42:45,757 - DEBUG - run_command: nodetool -h 10.55.83.44 repair myks -pr -st 001022121847518509771107014664873822543 -et 001024418238393207180729723314571303210 2014-10-27 15:42:45,757 - DEBUG - run_command: nodetool -h 10.55.83.44 repair myks -pr -st 001022121847518509771107014664873822543 -et 001024418238393207180729723314571303210 2014-10-27 15:42:45,757 - DEBUG - run_command: nodetool -h 10.55.83.44 repair myks -pr -st 001022121847518509771107014664873822543 -et 001024418238393207180729723314571303210 2014-10-27 15:42:45,757 - DEBUG - run_command: nodetool -h 10.55.83.44 repair myks -pr -st 001022121847518509771107014664873822543 -et 001024418238393207180729723314571303210 2014-10-27 15:42:45,757 - DEBUG - run_command: nodetool -h 10.55.83.44 repair myks -pr -st 001022121847518509771107014664873822543 -et 001024418238393207180729723314571303210 2014-10-27 15:42:45,757 - DEBUG - run_command: nodetool -h 10.55.83.44 repair myks -pr -st 001022121847518509771107014664873822543 -et 001024418238393207180729723314571303210 2014-10-27 15:42:45,757 - DEBUG - run_command: nodetool -h 10.55.83.44 repair myks -pr -st 001022121847518509771107014664873822543 -et 001024418238393207180729723314571303210 2014-10-27 15:42:45,757 - DEBUG - run_command: nodetool -h 10.55.83.44 repair myks -pr -st 001022121847518509771107014664873822543 -et 001024418238393207180729723314571303210 2014-10-27 15:42:45,757 - DEBUG - run_command: nodetool -h 10.55.83.44 repair myks -pr -st 001022121847518509771107014664873822543 -et 001024418238393207180729723314571303210

Looks like it process some parts up to 10 times (maybe more). I ran with this command:

./cassandra_range_repair/range_repair.py -k myks -H 10.55.xxx.xxx -s 100 -w 16 -D eu-west-xl -v -d --logfile=/var/log/cassandra/result.log

Also, while running a ps aux it seems things are run 2 times.

$ ps aux | grep nodetool root 28906 0.0 0.0 4400 616 pts/2 S+ 15:47 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003527227502730104786072662883066462783 -et 003527873595605486575538223266831799943 root 28907 0.0 0.0 4400 744 pts/2 S+ 15:47 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003527227502730104786072662883066462783 -et 003527873595605486575538223266831799943 root 28952 0.0 0.0 4400 616 pts/2 S+ 15:47 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003527873595605486575538223266831799943 -et 003528519688480868365003783650597137103 root 28953 0.0 0.0 4400 744 pts/2 S+ 15:47 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003527873595605486575538223266831799943 -et 003528519688480868365003783650597137103 root 29004 0.0 0.0 4400 616 pts/2 S+ 15:47 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003528519688480868365003783650597137103 -et 003529165781356250154469344034362474263 root 29005 0.0 0.0 4400 740 pts/2 S+ 15:47 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003528519688480868365003783650597137103 -et 003529165781356250154469344034362474263 root 29037 0.0 0.0 4400 612 pts/2 S+ 15:47 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003529165781356250154469344034362474263 -et 003529811874231631943934904418127811423 root 29038 0.0 0.0 4400 744 pts/2 S+ 15:47 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003529165781356250154469344034362474263 -et 003529811874231631943934904418127811423 root 29185 0.0 0.0 4400 612 pts/2 S+ 15:47 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003529811874231631943934904418127811423 -et 003530457967107013733400464801893148583 root 29186 0.0 0.0 4400 748 pts/2 S+ 15:47 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003529811874231631943934904418127811423 -et 003530457967107013733400464801893148583 root 29248 0.0 0.0 4400 616 pts/2 S+ 15:47 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003530457967107013733400464801893148583 -et 003531104059982395522866025185658485743 root 29249 0.0 0.0 4400 744 pts/2 S+ 15:47 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003530457967107013733400464801893148583 -et 003531104059982395522866025185658485743 root 29262 0.0 0.0 4400 612 pts/2 S+ 15:47 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003531104059982395522866025185658485743 -et 003531750152857777312331585569423822903 root 29263 0.0 0.0 4400 744 pts/2 S+ 15:47 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003531104059982395522866025185658485743 -et 003531750152857777312331585569423822903 root 29295 0.0 0.0 4400 616 pts/2 S+ 15:47 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003531750152857777312331585569423822903 -et 003532396245733159101797145953189160063 root 29311 0.0 0.0 4400 744 pts/2 S+ 15:47 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003531750152857777312331585569423822903 -et 003532396245733159101797145953189160063 root 29732 0.0 0.0 4400 616 pts/2 S+ 15:48 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003532396245733159101797145953189160063 -et 003533042338608540891262706336954497223 root 29733 0.0 0.0 4400 744 pts/2 S+ 15:48 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003532396245733159101797145953189160063 -et 003533042338608540891262706336954497223 root 29736 0.0 0.0 4400 616 pts/2 S+ 15:48 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003533042338608540891262706336954497223 -et 003533688431483922680728266720719834383 root 29737 0.0 0.0 4400 744 pts/2 S+ 15:48 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003533042338608540891262706336954497223 -et 003533688431483922680728266720719834383 root 29763 0.0 0.0 4400 612 pts/2 S+ 15:48 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003533688431483922680728266720719834383 -et 003534334524359304470193827104485171543 root 29767 0.0 0.0 4400 744 pts/2 S+ 15:48 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003533688431483922680728266720719834383 -et 003534334524359304470193827104485171543 root 29840 0.0 0.0 4400 612 pts/2 S+ 15:48 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003534334524359304470193827104485171543 -et 003534980617234686259659387488250508703 root 29847 0.0 0.0 4400 740 pts/2 S+ 15:48 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003534334524359304470193827104485171543 -et 003534980617234686259659387488250508703 root 30263 0.0 0.0 4400 612 pts/2 S+ 15:48 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003534980617234686259659387488250508703 -et 003535626710110068049124947872015845863 root 30264 0.0 0.0 4400 744 pts/2 S+ 15:48 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003534980617234686259659387488250508703 -et 003535626710110068049124947872015845863 root 30329 0.0 0.0 4400 616 pts/2 S+ 15:48 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003535626710110068049124947872015845863 -et 003536272802985449838590508255781183023 root 30330 0.0 0.0 4400 748 pts/2 S+ 15:48 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003535626710110068049124947872015845863 -et 003536272802985449838590508255781183023 root 30378 0.0 0.0 4400 612 pts/2 S+ 15:48 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003536272802985449838590508255781183023 -et 003536918895860831628056068639546520183 root 30379 0.0 0.0 4400 744 pts/2 S+ 15:48 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003536272802985449838590508255781183023 -et 003536918895860831628056068639546520183 root 30392 0.0 0.0 4400 612 pts/2 S+ 15:48 0:00 /bin/sh -c nodetool -h 10.55.83.44 repair myks -pr -st 003536918895860831628056068639546520183 -et 003537564988736213417521629023311857343 root 30393 0.0 0.0 4400 744 pts/2 S+ 15:48 0:00 /bin/sh /usr/bin/nodetool -h 10.55.83.44 repair myks -pr -st 003536918895860831628056068639546520183 -et 003537564988736213417521629023311857343 root 30629 0.0 0.0 8108 924 pts/0 S+ 15:48 0:00 grep --color=auto nodetool

AND

$ ps aux | grep nodetool | wc -l 33

--> 16 * 2 + 1 (last line, root 30629 0.0 0.0 8108 924 pts/0 S+ 15:48 0:00 grep --color=auto nodetool)

BrianGallew commented 10 years ago

Clearly it appears to be running the same step too many times. Yes, I still use it, but hadn't noticed this particular issue before. I will have to look into this.

pilate commented 10 years ago

It looks like half of this bug is caused by each multiprocess calling setup_logging. Python is actually registering multiple loggers, and each line is getting printed out by all of them.

With setup_logging called each time, you can watch the number of lines grow with every iteration:

INFO       2014-10-27 14:26:04,353 repair               line: 292 : [1/128] repairing range (-09072044732077677212, -09071731333627223589) in 2 steps for keyspace Files
DEBUG      2014-10-27 14:26:04,355 repair_range         line: 221 : 1/128 step 0001 repairing range (-09072044732077677212, -09071888032852450401) for keyspace Files
DEBUG      2014-10-27 14:26:04,355 repair_range         line: 221 : 1/128 step 0002 repairing range (-09071888032852450401, -09071731333627223590) for keyspace Files
DEBUG      2014-10-27 14:26:04,355 repair_range         line: 221 : 1/128 step 0001 repairing range (-09072044732077677212, -09071888032852450401) for keyspace Files
DEBUG      2014-10-27 14:26:04,356 repair_range         line: 221 : 1/128 step 0003 repairing range (-09071731333627223590, -09071731333627223589) for keyspace Files
DEBUG      2014-10-27 14:26:04,356 repair_range         line: 221 : 1/128 step 0003 repairing range (-09071731333627223590, -09071731333627223589) for keyspace Files
DEBUG      2014-10-27 14:26:04,356 repair_range         line: 221 : 1/128 step 0003 repairing range (-09071731333627223590, -09071731333627223589) for keyspace Files
INFO       2014-10-27 14:26:04,356 repair               line: 292 : [2/128] repairing range (-09070697222880512607, -09069705908938623738) in 2 steps for keyspace Files
DEBUG      2014-10-27 14:26:04,357 repair_range         line: 221 : 2/128 step 0001 repairing range (-09070697222880512607, -09070201565909568173) for keyspace Files
DEBUG      2014-10-27 14:26:04,357 repair_range         line: 221 : 2/128 step 0001 repairing range (-09070697222880512607, -09070201565909568173) for keyspace Files
DEBUG      2014-10-27 14:26:04,357 repair_range         line: 221 : 2/128 step 0001 repairing range (-09070697222880512607, -09070201565909568173) for keyspace Files
DEBUG      2014-10-27 14:26:04,357 repair_range         line: 221 : 2/128 step 0002 repairing range (-09070201565909568173, -09069705908938623739) for keyspace Files
DEBUG      2014-10-27 14:26:04,357 repair_range         line: 221 : 2/128 step 0003 repairing range (-09069705908938623739, -09069705908938623738) for keyspace Files
DEBUG      2014-10-27 14:26:04,357 repair_range         line: 221 : 2/128 step 0003 repairing range (-09069705908938623739, -09069705908938623738) for keyspace Files
DEBUG      2014-10-27 14:26:04,357 repair_range         line: 221 : 2/128 step 0003 repairing range (-09069705908938623739, -09069705908938623738) for keyspace Files
DEBUG      2014-10-27 14:26:04,357 repair_range         line: 221 : 2/128 step 0003 repairing range (-09069705908938623739, -09069705908938623738) for keyspace Files
INFO       2014-10-27 14:26:04,358 repair               line: 292 : [3/128] repairing range (-09048917357316976360, -09046764547381704699) in 2 steps for keyspace Files
DEBUG      2014-10-27 14:26:04,358 repair_range         line: 221 : 3/128 step 0002 repairing range (-09047840952349340530, -09046764547381704700) for keyspace Files
DEBUG      2014-10-27 14:26:04,358 repair_range         line: 221 : 3/128 step 0002 repairing range (-09047840952349340530, -09046764547381704700) for keyspace Files
DEBUG      2014-10-27 14:26:04,358 repair_range         line: 221 : 3/128 step 0002 repairing range (-09047840952349340530, -09046764547381704700) for keyspace Files
DEBUG      2014-10-27 14:26:04,358 repair_range         line: 221 : 3/128 step 0002 repairing range (-09047840952349340530, -09046764547381704700) for keyspace Files
DEBUG      2014-10-27 14:26:04,358 repair_range         line: 221 : 3/128 step 0002 repairing range (-09047840952349340530, -09046764547381704700) for keyspace Files
DEBUG      2014-10-27 14:26:04,359 repair_range         line: 221 : 3/128 step 0003 repairing range (-09046764547381704700, -09046764547381704699) for keyspace Files
DEBUG      2014-10-27 14:26:04,359 repair_range         line: 221 : 3/128 step 0003 repairing range (-09046764547381704700, -09046764547381704699) for keyspace Files
DEBUG      2014-10-27 14:26:04,359 repair_range         line: 221 : 3/128 step 0003 repairing range (-09046764547381704700, -09046764547381704699) for keyspace Files
DEBUG      2014-10-27 14:26:04,359 repair_range         line: 221 : 3/128 step 0003 repairing range (-09046764547381704700, -09046764547381704699) for keyspace Files
DEBUG      2014-10-27 14:26:04,359 repair_range         line: 221 : 3/128 step 0003 repairing range (-09046764547381704700, -09046764547381704699) for keyspace Files
DEBUG      2014-10-27 14:26:04,359 repair_range         line: 221 : 3/128 step 0003 repairing range (-09046764547381704700, -09046764547381704699) for keyspace Files
DEBUG      2014-10-27 14:26:04,360 repair_range         line: 221 : 3/128 step 0001 repairing range (-09048917357316976360, -09047840952349340530) for keyspace Files
DEBUG      2014-10-27 14:26:04,360 repair_range         line: 221 : 3/128 step 0001 repairing range (-09048917357316976360, -09047840952349340530) for keyspace Files
DEBUG      2014-10-27 14:26:04,360 repair_range         line: 221 : 3/128 step 0001 repairing range (-09048917357316976360, -09047840952349340530) for keyspace Files
DEBUG      2014-10-27 14:26:04,360 repair_range         line: 221 : 3/128 step 0001 repairing range (-09048917357316976360, -09047840952349340530) for keyspace Files
DEBUG      2014-10-27 14:26:04,360 repair_range         line: 221 : 3/128 step 0001 repairing range (-09048917357316976360, -09047840952349340530) for keyspace Files

But when you remove setup_logging from repair_range:

INFO       2014-10-27 14:30:12,719 repair               line: 289 : [1/128] repairing range (-09072044732077677212, -09071731333627223589) in 2 steps for keyspace Files
DEBUG      2014-10-27 14:30:12,721 repair_range         line: 219 : 1/128 step 0001 repairing range (-09072044732077677212, -09071888032852450401) for keyspace Files
DEBUG      2014-10-27 14:30:12,722 repair_range         line: 219 : 1/128 step 0003 repairing range (-09071731333627223590, -09071731333627223589) for keyspace Files
INFO       2014-10-27 14:30:12,722 repair               line: 289 : [2/128] repairing range (-09070697222880512607, -09069705908938623738) in 2 steps for keyspace Files
DEBUG      2014-10-27 14:30:12,722 repair_range         line: 219 : 2/128 step 0001 repairing range (-09070697222880512607, -09070201565909568173) for keyspace Files
DEBUG      2014-10-27 14:30:12,723 repair_range         line: 219 : 2/128 step 0002 repairing range (-09070201565909568173, -09069705908938623739) for keyspace Files
DEBUG      2014-10-27 14:30:12,723 repair_range         line: 219 : 2/128 step 0003 repairing range (-09069705908938623739, -09069705908938623738) for keyspace Files
INFO       2014-10-27 14:30:12,723 repair               line: 289 : [3/128] repairing range (-09048917357316976360, -09046764547381704699) in 2 steps for keyspace Files
DEBUG      2014-10-27 14:30:12,723 repair_range         line: 219 : 3/128 step 0001 repairing range (-09048917357316976360, -09047840952349340530) for keyspace Files
DEBUG      2014-10-27 14:30:12,724 repair_range         line: 219 : 3/128 step 0002 repairing range (-09047840952349340530, -09046764547381704700) for keyspace Files
DEBUG      2014-10-27 14:30:12,724 repair_range         line: 219 : 3/128 step 0003 repairing range (-09046764547381704700, -09046764547381704699) for keyspace Files
INFO       2014-10-27 14:30:12,724 repair               line: 289 : [4/128] repairing range (-08884953941901189586, -08884813907480338339) in 2 steps for keyspace Files

It's much more consistent. (although it seems like it's alternating between two and three steps for repair)

This can also be shown by adding a random integer to the beginning of stderr_log_format for each setup_logging call:

5238 INFO       2014-10-27 14:43:07,176 repair               line: 290 : [1/128] repairing range (-09072044732077677212, -09071731333627223589) in 2 steps for keyspace Files
5238 DEBUG      2014-10-27 14:43:07,178 repair_range         line: 220 : 1/128 step 0002 repairing range (-09071888032852450401, -09071731333627223590) for keyspace Files
7977 DEBUG      2014-10-27 14:43:07,178 repair_range         line: 220 : 1/128 step 0001 repairing range (-09072044732077677212, -09071888032852450401) for keyspace Files
7013 DEBUG      2014-10-27 14:43:07,178 repair_range         line: 220 : 1/128 step 0002 repairing range (-09071888032852450401, -09071731333627223590) for keyspace Files
5238 DEBUG      2014-10-27 14:43:07,179 repair_range         line: 220 : 1/128 step 0003 repairing range (-09071731333627223590, -09071731333627223589) for keyspace Files
7977 DEBUG      2014-10-27 14:43:07,179 repair_range         line: 220 : 1/128 step 0003 repairing range (-09071731333627223590, -09071731333627223589) for keyspace Files
7706 DEBUG      2014-10-27 14:43:07,179 repair_range         line: 220 : 1/128 step 0003 repairing range (-09071731333627223590, -09071731333627223589) for keyspace Files
5238 INFO       2014-10-27 14:43:07,179 repair               line: 290 : [2/128] repairing range (-09070697222880512607, -09069705908938623738) in 2 steps for keyspace Files
5238 DEBUG      2014-10-27 14:43:07,180 repair_range         line: 220 : 2/128 step 0002 repairing range (-09070201565909568173, -09069705908938623739) for keyspace Files
7977 DEBUG      2014-10-27 14:43:07,180 repair_range         line: 220 : 2/128 step 0002 repairing range (-09070201565909568173, -09069705908938623739) for keyspace Files
7706 DEBUG      2014-10-27 14:43:07,180 repair_range         line: 220 : 2/128 step 0002 repairing range (-09070201565909568173, -09069705908938623739) for keyspace Files
6062 DEBUG      2014-10-27 14:43:07,180 repair_range         line: 220 : 2/128 step 0002 repairing range (-09070201565909568173, -09069705908938623739) for keyspace Files
5238 DEBUG      2014-10-27 14:43:07,181 repair_range         line: 220 : 2/128 step 0003 repairing range (-09069705908938623739, -09069705908938623738) for keyspace Files
7977 DEBUG      2014-10-27 14:43:07,181 repair_range         line: 220 : 2/128 step 0003 repairing range (-09069705908938623739, -09069705908938623738) for keyspace Files
7706 DEBUG      2014-10-27 14:43:07,181 repair_range         line: 220 : 2/128 step 0003 repairing range (-09069705908938623739, -09069705908938623738) for keyspace Files
6062 DEBUG      2014-10-27 14:43:07,181 repair_range         line: 220 : 2/128 step 0003 repairing range (-09069705908938623739, -09069705908938623738) for keyspace Files
6398 DEBUG      2014-10-27 14:43:07,181 repair_range         line: 220 : 2/128 step 0003 repairing range (-09069705908938623739, -09069705908938623738) for keyspace Files
5238 DEBUG      2014-10-27 14:43:07,181 repair_range         line: 220 : 2/128 step 0001 repairing range (-09070697222880512607, -09070201565909568173) for keyspace Files
7013 DEBUG      2014-10-27 14:43:07,181 repair_range         line: 220 : 2/128 step 0001 repairing range (-09070697222880512607, -09070201565909568173) for keyspace Files
4262 DEBUG      2014-10-27 14:43:07,181 repair_range         line: 220 : 2/128 step 0001 repairing range (-09070697222880512607, -09070201565909568173) for keyspace Files
5238 INFO       2014-10-27 14:43:07,182 repair               line: 290 : [3/128] repairing range (-09048917357316976360, -09046764547381704699) in 2 steps for keyspace Files
5238 DEBUG      2014-10-27 14:43:07,182 repair_range         line: 220 : 3/128 step 0001 repairing range (-09048917357316976360, -09047840952349340530) for keyspace Files
7977 DEBUG      2014-10-27 14:43:07,182 repair_range         line: 220 : 3/128 step 0001 repairing range (-09048917357316976360, -09047840952349340530) for keyspace Files
7706 DEBUG      2014-10-27 14:43:07,182 repair_range         line: 220 : 3/128 step 0001 repairing range (-09048917357316976360, -09047840952349340530) for keyspace Files
6062 DEBUG      2014-10-27 14:43:07,182 repair_range         line: 220 : 3/128 step 0001 repairing range (-09048917357316976360, -09047840952349340530) for keyspace Files
6398 DEBUG      2014-10-27 14:43:07,182 repair_range         line: 220 : 3/128 step 0001 repairing range (-09048917357316976360, -09047840952349340530) for keyspace Files
8873 DEBUG      2014-10-27 14:43:07,182 repair_range         line: 220 : 3/128 step 0001 repairing range (-09048917357316976360, -09047840952349340530) for keyspace Files
5238 DEBUG      2014-10-27 14:43:07,183 repair_range         line: 220 : 3/128 step 0002 repairing range (-09047840952349340530, -09046764547381704700) for keyspace Files
5238 DEBUG      2014-10-27 14:43:07,183 repair_range         line: 220 : 3/128 step 0003 repairing range (-09046764547381704700, -09046764547381704699) for keyspace Files
7013 DEBUG      2014-10-27 14:43:07,183 repair_range         line: 220 : 3/128 step 0003 repairing range (-09046764547381704700, -09046764547381704699) for keyspace Files
4262 DEBUG      2014-10-27 14:43:07,183 repair_range         line: 220 : 3/128 step 0003 repairing range (-09046764547381704700, -09046764547381704699) for keyspace Files
9841 DEBUG      2014-10-27 14:43:07,183 repair_range         line: 220 : 3/128 step 0003 repairing range (-09046764547381704700, -09046764547381704699) for keyspace Files
6062 DEBUG      2014-10-27 14:43:07,183 repair_range         line: 220 : 3/128 step 0002 repairing range (-09047840952349340530, -09046764547381704700) for keyspace Files
6398 DEBUG      2014-10-27 14:43:07,183 repair_range         line: 220 : 3/128 step 0002 repairing range (-09047840952349340530, -09046764547381704700) for keyspace Files
8873 DEBUG      2014-10-27 14:43:07,183 repair_range         line: 220 : 3/128 step 0002 repairing range (-09047840952349340530, -09046764547381704700) for keyspace Files
1661 DEBUG      2014-10-27 14:43:07,183 repair_range         line: 220 : 3/128 step 0002 repairing range (-09047840952349340530, -09046764547381704700) for keyspace Files
arodrime commented 10 years ago

@BrianGallew I have a few more questions, not sure where I should post them, so here they are:

Is this peace of software compatible with any 1.2, 2.0 and 2.1 C* version ? Does C* 2.1 still need those range repair, given the fact that they are now incremental ? Also, trying it with 1 worker I need 1 day to repair 130 GB of data, this looks very poor to me (test machines are m1.xlarge from AWS). Is it running faster on your side ? Also, do you run this on one admin server looping on all the instances or do you run it locally on each machine ? How do you use this ?

Did you had time to check what is going on, @pilate are you still working on this ?

BrianGallew commented 10 years ago

15 fixes the duplicate log messages (thanks @pilate !). I'm going to test to see why things are duplicated and how to resolve that.

This script is known to work with 2.0 (though it doesn't directly support the -par flag). I have no idea if it works with 2.1.

The way I run it is via https://github.com/llnw/cassandra_tools/blob/master/cassandra_repair_scheduler.py which gets put into a cron entry on every host. There is no requirement to run it that way as you can certainly run it by hand as necessary.

pilate commented 10 years ago

I don't think duplicate work is actually happening, it's just a result of how python launches subprocesses with shell=True.

Example of what happens when launching a command with shell=True:

>>> proc = subprocess.Popen("sleep 10", shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
>>> stdout, stderr = proc.communicate()
$ ps aux | grep sleep
user   1892  0.0  0.0   4404   612 pts/30   S+   08:07   0:00 /bin/sh -c sleep 10
user   1893  0.0  0.0   7164   608 pts/30   S+   08:07   0:00 sleep 10

This matches @arodrime's output with one of the duplicate processes being prefixed by '/bin/sh -c'

arodrime commented 10 years ago

Oh, so this work as expected so... I get trapped by a badly logged info + python behaviour I guess. I let you close the case Brian if you agree to say everything is fine.

BrianGallew commented 10 years ago

Yes, I'll close this as @pilate 's patch solves the logging issue. The "good" news is that this led me to discover a bug in sub-range generation. I just updated the script to account for that, as well as adding a new unit test and fixing the one that the change "broke".