GELOG / docker-ubuntu-hbase

Dockerfile for running HBase on Ubuntu
Apache License 2.0
7 stars 5 forks source link

mapreduce.DefaultVisibilityExpressionResolver: Error scanning 'labels' table #4

Open davidonlaptop opened 9 years ago

davidonlaptop commented 9 years ago

HBase complains about a missing table when importing data using ImportTsv.

015-08-22 08:46:35,470 ERROR [LocalJobRunner Map Task Executor #0] mapreduce.DefaultVisibilityExpressionResolver: Error scanning 'labels' table
org.apache.hadoop.hbase.TableNotFoundException: Table 'hbase:labels' was not found, got: abcd2.
    at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1274)
    at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1155)
    at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:300)
    at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:153)
    at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:61)
    at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
    at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320)
    at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:295)
    at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:160)
    at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:155)
    at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:811)
    at org.apache.hadoop.hbase.mapreduce.DefaultVisibilityExpressionResolver.init(DefaultVisibilityExpressionResolver.java:90)
    at org.apache.hadoop.hbase.mapreduce.CellCreator.<init>(CellCreator.java:48)
    at org.apache.hadoop.hbase.mapreduce.TsvImporterMapper.setup(TsvImporterMapper.java:107)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Mahesha999 commented 8 years ago

did you get the workaround? I was also getting the same.

startprogress commented 7 years ago

How to fix this?

davidonlaptop commented 7 years ago

Damn, can't remember right now... !

Apparently, you're not the only one: https://issues.apache.org/jira/browse/HBASE-14365

@Jean-Marc, have you seen this before ?

On Fri, Oct 7, 2016 at 10:42 PM, Andrew Zhang notifications@github.com wrote:

How to fix this?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/GELOG/docker-ubuntu-hbase/issues/4#issuecomment-252397761, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAqbI6xXODeKoQiaxece2rxTsFan7v8ks5qxwMTgaJpZM4FwUq8 .

davidonlaptop commented 7 years ago

Do you have the full stack trace?

2016-10-08 1:00 GMT-04:00 David Lauzon davidonlaptop@gmail.com:

Damn, can't remember right now... !

Apparently, you're not the only one: https://issues.apache.org/jira/browse/HBASE-14365

@Jean-Marc, have you seen this before ?

On Fri, Oct 7, 2016 at 10:42 PM, Andrew Zhang notifications@github.com wrote:

How to fix this?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/GELOG/docker-ubuntu-hbase/issues/4#issuecomment-252397761, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAqbI6xXODeKoQiaxece2rxTsFan7v8ks5qxwMTgaJpZM4FwUq8 .

startprogress commented 7 years ago

ya, as follows:

2016-10-07 11:34:53,296 INFO [main-SendThread(aiszk1.boloomo.com:2181)] org.apache.zookeeper.ClientCnxn: Opening socket connection to server aiszk1.boloomo.com/192.168.30.113:2181. Will not attempt to authenticate using SASL (unknown error)
2016-10-07 11:34:53,497 INFO [main-SendThread(aiszk1.boloomo.com:2181)] org.apache.zookeeper.ClientCnxn: Socket connection established, initiating session, client: /192.168.30.121:50440, server: aiszk1.boloomo.com/192.168.30.113:2181
2016-10-07 11:34:53,708 INFO [main-SendThread(aiszk1.boloomo.com:2181)] org.apache.zookeeper.ClientCnxn: Session establishment complete on server aiszk1.boloomo.com/192.168.30.113:2181, sessionid = 0x25789e819482541, negotiated timeout = 60000
2016-10-07 11:35:05,294 ERROR [main] org.apache.hadoop.hbase.mapreduce.DefaultVisibilityExpressionResolver: Error scanning 'labels' table
org.apache.hadoop.hbase.TableNotFoundException: hbase:labels
        at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1404)
        at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1199)
        at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:305)
        at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:156)
        at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60)
        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
        at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320)
        at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:295)
        at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:160)
        at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:155)
        at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:867)
        at org.apache.hadoop.hbase.mapreduce.DefaultVisibilityExpressionResolver.init(DefaultVisibilityExpressionResolver.java:91)
        at org.apache.hadoop.hbase.mapreduce.CellCreator.<init>(CellCreator.java:48)
        at org.apache.hadoop.hbase.mapreduce.TsvImporterMapper.setup(TsvImporterMapper.java:108)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
davidonlaptop commented 7 years ago

Trying again to tag Jean-Marc into discussion: @jmspaggi

davidonlaptop commented 7 years ago

Hi @startprogress, does it work in spite of the error message?

If not, could you post a step-by-step to reproduce your problem?

jean-marc commented 7 years ago

I am afraid you have the wrong Jean-Marc

jm

On Sat, Oct 8, 2016 at 11:33 AM, David Lauzon notifications@github.com wrote:

Trying again to tag Jean-Marc into discussion: @jmspaggi https://github.com/jmspaggi

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GELOG/docker-ubuntu-hbase/issues/4#issuecomment-252440670, or mute the thread https://github.com/notifications/unsubscribe-auth/AAODFWHOfHWcA7M0AWIJv9HwP-AT6ci2ks5qx-ISgaJpZM4FwUq8 .

jmspaggi commented 7 years ago

Worked ;)

Is this stacktrace causing any issue? What HBase version are you trying? As David pointed before on JIRA, if cell level security is not used, this "ERROR" can be considered as an INFO or a WARNING...

Is HBase stilll working well after that?

JMS

davidonlaptop commented 7 years ago

I've run a simple test which throws the error, but works anyhow:

Importing the CSV:

cat <<EOF >> /tmp/simple.csv
a,b,c
1,2,4
5,6,8
EOF

hbase shell create 'simpletable', 'cf'

hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,cf:b,cf:c '-Dimporttsv.separator=,' simpletable /tmp/simple.csv

Validating that the data is there:

root@hbase-shell:/# hbase shell
2016-10-08 18:41:34,140 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.1.1, rd0a115a7267f54e01c72c603ec53e91ec418292f, Tue Jun 23 14:44:07 PDT 2015

hbase(main):001:0> scan 'simpletable'
ROW                                                   COLUMN+CELL
 1                                                    column=cf:b, timestamp=1475951843015, value=2
 1                                                    column=cf:c, timestamp=1475951843015, value=4
 5                                                    column=cf:b, timestamp=1475951843015, value=6
 5                                                    column=cf:c, timestamp=1475951843015, value=8
 a                                                    column=cf:b, timestamp=1475951843015, value=b
 a                                                    column=cf:c, timestamp=1475951843015, value=c
3 row(s) in 0.1780 seconds

@startprogress: could you confirm on your side?

startprogress commented 7 years ago

@davidonlaptop I've confirmed. This test did throw the error, but worked anyhow. Here is the output of scan:

hbase(main):001:0> scan 'simpletable'
ROW                                      COLUMN+CELL                                                                                                        
 1                                       column=cf:b, timestamp=1475978024254, value=2                                                                      
 1                                       column=cf:c, timestamp=1475978024254, value=3                                                                      
 4                                       column=cf:b, timestamp=1475978024254, value=5                                                                      
 4                                       column=cf:c, timestamp=1475978024254, value=6                                                                      
 a                                       column=cf:b, timestamp=1475978024254, value=b                                                                      
 a                                       column=cf:c, timestamp=1475978024254, value=c                                                                      
3 row(s) in 0.5620 seconds

But with my own data(about 140GB), after the course of Importtsv, the hbase table is still empty. So I think some other errors might exist and they were not logged thus hard to be found out.

jmspaggi commented 7 years ago

Nothing else on the logs? What does your csv file looks like?

2016-10-08 22:00 GMT-04:00 Andrew Zhang notifications@github.com:

@davidonlaptop https://github.com/davidonlaptop I've confirmed. This test did throw the error, but worked anyhow. Here is the output of scan:

hbase(main):001:0> scan 'simpletable' ROW COLUMN+CELL 1 column=cf:b, timestamp=1475978024254, value=2 1 column=cf:c, timestamp=1475978024254, value=3 4 column=cf:b, timestamp=1475978024254, value=5 4 column=cf:c, timestamp=1475978024254, value=6 a column=cf:b, timestamp=1475978024254, value=b a column=cf:c, timestamp=1475978024254, value=c 3 row(s) in 0.5620 seconds

But with my own data(about 140GB), after the course of Importtsv, the hbase table is still empty. So I think some other errors might exist and they were not logged thus hard to be found out.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GELOG/docker-ubuntu-hbase/issues/4#issuecomment-252459266, or mute the thread https://github.com/notifications/unsubscribe-auth/AC4MY6W3INt97FN-7Yg20wcfDLMZLEmjks5qyErOgaJpZM4FwUq8 .

startprogress commented 7 years ago

@jmspaggi with grep 'Exception', I got three types of logs:

org.apache.hadoop.hbase.TableNotFoundException: hbase:labels
2016-10-07 11:39:30,234 WARN [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree: Error reading the stream java.io.IOException: No such process
java.net.SocketTimeoutException: Read timed out

my file is tsv file, each field is separated by tab.

startprogress commented 7 years ago

@jmspaggi I got a WARN as follows:

2016-10-07 13:06:34,383 WARN [fetcher#8] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Failed to connect to 192.168.30.122:13562 with 6 map outputs
java.net.SocketTimeoutException: Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
        at java.net.SocketInputStream.read(SocketInputStream.java:170)
        at java.net.SocketInputStream.read(SocketInputStream.java:141)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
        at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704)
        at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647)
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1536)
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441)
        at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
        at org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:430)
        at org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:395)
        at org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:266)
        at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:323)
        at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)

Maybe this is the real problem

jmspaggi commented 7 years ago

This is "just" a timeout so MR should retry. Your table is completely empty? Nothing? And not any other issue? Can you change that to DEBUG mode and retry to see what it says?

JMS

2016-10-08 22:59 GMT-04:00 Andrew Zhang notifications@github.com:

@jmspaggi https://github.com/jmspaggi I got a WARN as follows: 2016-10-07 13:06:34,383 WARN [fetcher#8] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Failed to connect to 192.168.30.122:13562 with 6 map outputs java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:170) at java.net.SocketInputStream.read(SocketInputStream.java:141) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647) at sun.net.www.protocol.http.HttpURLConnection.getInputStream0( HttpURLConnection.java:1536) at sun.net.www.protocol.http.HttpURLConnection.getInputStream( HttpURLConnection.java:1441) at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) at org.apache.hadoop.mapreduce.task.reduce.Fetcher. verifyConnection(Fetcher.java:430) at org.apache.hadoop.mapreduce.task.reduce.Fetcher. setupConnectionsWithRetry(Fetcher.java:395) at org.apache.hadoop.mapreduce.task.reduce.Fetcher. openShuffleUrl(Fetcher.java:266) at org.apache.hadoop.mapreduce.task.reduce.Fetcher. copyFromHost(Fetcher.java:323) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) maybe this is the real problem

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GELOG/docker-ubuntu-hbase/issues/4#issuecomment-252461274, or mute the thread https://github.com/notifications/unsubscribe-auth/AC4MY9ujXF38aEgMAj0Yl84J75Dm41Y6ks5qyFicgaJpZM4FwUq8 .

davidonlaptop commented 7 years ago

Also, you could try with a subset of your data first, like the first 10 lines.

startprogress commented 7 years ago

@davidonlaptop I've tried. A subset of my data worked fine.

startprogress commented 7 years ago

@jmspaggi At the beginning, I used a table with a namespace named 'n1:t1'. And after I changed the namespace back to default, it worked well and the table is with data. I don't know why but it's kind of a workaround.

jmspaggi commented 7 years ago

Oh, interesting! So you are saying that doing ImportTsv with namespace seems to not be working? Do you have the exact command line you used for that? Have you tried to put the namespace:table within quotes?

JMS

2016-10-10 5:34 GMT-04:00 Andrew Zhang notifications@github.com:

@jmspaggi https://github.com/jmspaggi At the beginning, I used a table with a namespace named 'n1:t1'. And after I changed the namespace back to default, it worked well and the table is with data. I don't know why but it's kind of a workaround.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GELOG/docker-ubuntu-hbase/issues/4#issuecomment-252572295, or mute the thread https://github.com/notifications/unsubscribe-auth/AC4MY0x1PG-SLmSDFiMp7_jbFLX7j6fgks5qygaRgaJpZM4FwUq8 .

startprogress commented 7 years ago

@jmspaggi Yes, it's really strange. The exact command line:

        hbase org.apache.hadoop.hbase.mapreduce.ImportTsv \
        -Dimporttsv.columns=HBASE_ROW_KEY,f:v \
        -Dimporttsv.bulk.output=$tempdir \
        -Dmapred.min.split.size=$minSplit \
        -Ddfs.umaskmode=000 \
        -Dmapreduce.map.memory.mb=$trueMapMemmb  \
    -Dmapreduce.map.java.opts.max.heap=$truemapjava \
        -Dmapreduce.reduce.memory.mb=$reduceMem  \
        -Dmapreduce.reduce.java.opts.max.heap=$reduceJava \
        -Dmapreduce.reduce.cpu.vcores=$reduceVcores \
     tableName hdfspath

Those parameters are calculated from the size of the file and the configuration of my cluster.

davidonlaptop commented 7 years ago

@startprogress : FYI. I just pushed a new Dockerfile with all the latest versions (HBase 1.2.3, OpenJDK 8, Ubuntu 16.04) if you want to give it a try. Maybe, it can solve your issue.

jmspaggi commented 7 years ago

@startprogress what is the value of tableName? Any chance to try with "tableName" instead?

2016-10-11 9:54 GMT-04:00 David Lauzon notifications@github.com:

@startprogress https://github.com/startprogress : FYI. I just pushed a new Dockerfile with all the latest versions (HBase 1.2.3, OpenJDK 8, Ubuntu 16.04) if you want to give it a try.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GELOG/docker-ubuntu-hbase/issues/4#issuecomment-252923460, or mute the thread https://github.com/notifications/unsubscribe-auth/AC4MY3ilelW54Je6w7XxDid3xB8o7FPXks5qy5UagaJpZM4FwUq8 .

startprogress commented 7 years ago

@jmspaggi tableName is changed to 't1' instead of 'n1:t1', the command worked well now.

startprogress commented 7 years ago

@davidonlaptop Ok, thx for ur help.

jmspaggi commented 7 years ago

@startprogress What I'm wondering is that, can you give "n1:t1" instead of n1:t1 (Note the double quotes). what I'm suppecting is the shell to interpret ":". @david does it work for you with another namespace? Make sure it is created before using it.

2016-10-12 5:13 GMT-04:00 Andrew Zhang notifications@github.com:

@davidonlaptop https://github.com/davidonlaptop Ok, thx for ur help anyway. I will have a try sometime later.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GELOG/docker-ubuntu-hbase/issues/4#issuecomment-253160909, or mute the thread https://github.com/notifications/unsubscribe-auth/AC4MY2d--w-pvKUNzVUwIFFKGCLi0LtOks5qzKTDgaJpZM4FwUq8 .

startprogress commented 7 years ago

@jmspaggi Oh, I got it. I've used the double quotes. Didn't work either.