Closed glycerine closed 8 years ago
To try and reproduce this, I started over from scratch, renaming away the cockroach-data directory. Still only running one database node. This time after I kill -9 the server and bring it back up again, the client connects to the server, but the server has forgotten the database "bank" that I had created.
Severe data loss detected; the entire database is gone this time.
root@cr1:26257> set database = bank; select rowid,* from z;
root@cr1:26257> set database = bank; select rowid,* from z;
+--------------------+----+--------+-----+
| rowid | a | b | d |
+--------------------+----+--------+-----+
| 153258748090122241 | 34 | yeehaw | 123 |
| 153258822017286145 | 34 | yeehaw | 123 |
| 153258873012715521 | 34 | yeehaw | 123 |
| 153258901593817089 | 34 | yeehaw | 123 |
| 153258901597716481 | 34 | yeehaw | 123 |
+--------------------+----+--------+-----+
root@cr1:26257> ^D
< here will kill -9 and restart the server>
[jaten@cr1 ~]$ cockroach sql --insecure --host=cr1
# Welcome to the cockroach SQL interface.
# All statements must be terminated by a semicolon.
# To exit: CTRL + D.
root@cr1:26257> set database = bank; select rowid,* from z;
root@cr1:26257> set database = bank; select rowid,* from z;
pq: database "bank" does not exist
root@cr1:26257> set database = bank;
root@cr1:26257> set database = bank;
pq: database "bank" does not exist
root@cr1:26257>
full transcript:
[jaten@cr1 ~]$ mv cockroach-data cockroach-data.hangs
[jaten@cr1 ~]$ cockroach start --insecure --host=cr1 --background
build: beta-20160616 @ 2016/06/16 16:51:45 (go1.6.2)
admin: http://cr1:8080
sql: postgresql://root@cr1:26257?sslmode=disable
logs: cockroach-data/logs
store[0]: path=cockroach-data
[jaten@cr1 ~]$ cockroach sql --insecure --host=cr1
# Welcome to the cockroach SQL interface.
# All statements must be terminated by a semicolon.
# To exit: CTRL + D.
root@cr1:26257> show tables;
root@cr1:26257> show tables;
pq: no database specified
root@cr1:26257> create database bank;
root@cr1:26257> create database bank;
CREATE DATABASE
root@cr1:26257> set database=bank;
root@cr1:26257> set database=bank;
SET
root@cr1:26257> show tables;
root@cr1:26257> show tables;
+-------+
| Table |
+-------+
+-------+
root@cr1:26257> create table z (rowid int, a int, b string, d bytes);
root@cr1:26257> create table z (rowid int, a int, b string, d bytes);
pq: duplicate column name: "rowid"
root@cr1:26257> create table z (a int, b string, d bytes);
root@cr1:26257> create table z (a int, b string, d bytes);
CREATE TABLE
root@cr1:26257> z
root@cr1:26257> z
-> ;
-> ;
pq: syntax error at or near "z"
z
^
root@cr1:26257> show columns from z;
root@cr1:26257> show columns from z;
+-------+--------+-------+----------------+
| Field | Type | Null | Default |
+-------+--------+-------+----------------+
| a | INT | true | NULL |
| b | STRING | true | NULL |
| d | BYTES | true | NULL |
| rowid | INT | false | unique_rowid() |
+-------+--------+-------+----------------+
root@cr1:26257> insert into z (a,b,d) values (34, 'yeehaw', b'123');
root@cr1:26257> insert into z (a,b,d) values (34, 'yeehaw', b'123');
INSERT 1
root@cr1:26257> select * from z;
root@cr1:26257> select * from z;
+----+--------+-----+
| a | b | d |
+----+--------+-----+
| 34 | yeehaw | 123 |
+----+--------+-----+
root@cr1:26257> select rowid,* from z;
root@cr1:26257> select rowid,* from z;
+--------------------+----+--------+-----+
| rowid | a | b | d |
+--------------------+----+--------+-----+
| 153258748090122241 | 34 | yeehaw | 123 |
+--------------------+----+--------+-----+
root@cr1:26257> insert into z (a,b,d) values (34, 'yeehaw', b'123');
root@cr1:26257> insert into z (a,b,d) values (34, 'yeehaw', b'123');
INSERT 1
root@cr1:26257> select rowid,* from z;
root@cr1:26257> select rowid,* from z;
+--------------------+----+--------+-----+
| rowid | a | b | d |
+--------------------+----+--------+-----+
| 153258748090122241 | 34 | yeehaw | 123 |
| 153258822017286145 | 34 | yeehaw | 123 |
+--------------------+----+--------+-----+
root@cr1:26257> insert into z (a,b,d) values (34, 'yeehaw', b'123');
root@cr1:26257> insert into z (a,b,d) values (34, 'yeehaw', b'123');
INSERT 1
root@cr1:26257> select rowid,* from z;
root@cr1:26257> select rowid,* from z;
+--------------------+----+--------+-----+
| rowid | a | b | d |
+--------------------+----+--------+-----+
| 153258748090122241 | 34 | yeehaw | 123 |
| 153258822017286145 | 34 | yeehaw | 123 |
| 153258873012715521 | 34 | yeehaw | 123 |
+--------------------+----+--------+-----+
root@cr1:26257> insert into z (a,b,d) values (34, 'yeehaw', b'123');insert into z (a,b,d) values (34, 'yeehaw', b'123');
root@cr1:26257> insert into z (a,b,d) values (34, 'yeehaw', b'123');insert into z (a,b,d) values (34, 'yeehaw', b'123');
INSERT 1
root@cr1:26257> select rowid,* from z;
root@cr1:26257> select rowid,* from z;
+--------------------+----+--------+-----+
| rowid | a | b | d |
+--------------------+----+--------+-----+
| 153258748090122241 | 34 | yeehaw | 123 |
| 153258822017286145 | 34 | yeehaw | 123 |
| 153258873012715521 | 34 | yeehaw | 123 |
| 153258901593817089 | 34 | yeehaw | 123 |
| 153258901597716481 | 34 | yeehaw | 123 |
+--------------------+----+--------+-----+
root@cr1:26257> ^D
[jaten@cr1 ~]$ cockroach sql --insecure --host=cr1
# Welcome to the cockroach SQL interface.
# All statements must be terminated by a semicolon.
# To exit: CTRL + D.
root@cr1:26257> set database = bank; select rowid,* from z;
root@cr1:26257> set database = bank; select rowid,* from z;
+--------------------+----+--------+-----+
| rowid | a | b | d |
+--------------------+----+--------+-----+
| 153258748090122241 | 34 | yeehaw | 123 |
| 153258822017286145 | 34 | yeehaw | 123 |
| 153258873012715521 | 34 | yeehaw | 123 |
| 153258901593817089 | 34 | yeehaw | 123 |
| 153258901597716481 | 34 | yeehaw | 123 |
+--------------------+----+--------+-----+
root@cr1:26257> ^D
[jaten@cr1 ~]$ cockroach sql --insecure --host=cr1
# Welcome to the cockroach SQL interface.
# All statements must be terminated by a semicolon.
# To exit: CTRL + D.
root@cr1:26257> set database = bank; select rowid,* from z;
root@cr1:26257> set database = bank; select rowid,* from z;
pq: database "bank" does not exist
root@cr1:26257> set database = bank;
root@cr1:26257> set database = bank;
pq: database "bank" does not exist
root@cr1:26257> ^D
Error: pq: database "bank" does not exist
Failed running "sql"
[jaten@cr1 ~]$ ls
cockroach-data cockroach-data.hangs cockroach-data.tar.gz cr.hang.txt go pkg
[jaten@cr1 ~]$
attaching the cockroach-data directory from the 2nd round discussed in the comment above. cockroach-data.database-gone.tar.gz
I suspected/checked digital oceans hardware... running Brad Fitzpatrick's diskchecker ( http://brad.livejournal.com/2116715.html ) seems to indicate that fsync is working as advertised. So this does indeed appear to be a real cockroachdb bug.
[jaten@cr2 ~]$ ./diskchecker.pl -s cr1 verify test_file
verifying: 0.00%
verifying: 41.25%
verifying: 92.57%
verifying: 100.00%
Total errors: 0
[jaten@cr2 ~]$
It seems that you are reporting two issues here.
1) cockroach sql --host=cr1
just sits there.
If cr1
is not an alias for localhost
, you need to specify --insecure
on the cockroach sql
command as well, or the URL listed in the stdout output of the start
command (eg: cockroach sql --url='postgresql://root@cr1:26257?sslmode=disable'
which is the equivalent of running cockroach sql --host=cr1 --insecure
).
It looks like this is what you did in your comment to the initial bug. Please confirm if the behavior without --insecure
is what you originally reported.
2) the bank
database disappears.
I presume your kill -9
was between the two rounds select
(the first one showing the correct rows, the second not even knowing the database).
In the tarball you included, I'm only seeing one run (the INFO log has lines at the beginning of the process such as cli/start.go:228 CockroachDB beta-20160616 ...
, there's only one of them). Could you make sure you run your cockroach start && create tables/data && kill -9 && cockroach start && select
on the same data directory?
Hi @mberhault, thanks for clarifying that --insecure is needed symmetrically, that does indeed let me connect, fixing the first issue. On the second, multiple rounds of kill -9 followed by restart of the database and using --insecure and --host=cr1 on both: they don't seem to make things disappear once I have that going, it looks clean. So I think we can mark it both these up to user error on my part in learning my way around.
On linux/amd64 Centos7.2 [digital ocean, 512MB droplet] with cockroach version: that fetched with wget https://binaries.cockroachdb.com/cockroach-beta-20160616.linux-amd64.tgz
When log files are not available, supply the output of
cockroach version
and all flags/environment variables passed tocockroach start
instead.this is how cockroach was started:
a) I started the server: $ cockroach start --insecure --host=cr1
b) I killed the cockroach server with "kill -9 pid". Then I brought it back up again with $ cockroach start --insecure --host=cr1 --background
c) then I tried to connect to it with $ cockroach sql which didn't work at all, but then I tried $ cockroach sql --host=cr1 which seemed to work but then hung forever when I issued "show databases;"
What I expected: no hang on trying to do "show databases" What I observed: hang forever, waiting for some response from the server.
Attaching log files. Stack traces of server and client I'll paste below. cockroach-data.tar.gz
stack traces: