ParallelAI / SpyGlass

Cascading and Scalding wrapper for HBase with advanced read features
Apache License 2.0
54 stars 31 forks source link

NullPointerException using CDH5 Branch #18

Open galarragas opened 10 years ago

galarragas commented 10 years ago

Opening from the follow up on issue #17

Error: java.lang.NullPointerException at parallelai.spyglass.hbase.HBaseRecordReaderBase.setHTable(HBaseRecordReaderBase.java:64) at parallelai.spyglass.hbase.HBaseInputFormatGranular.getRecordReader(HBaseInputFormatGranular.java:373) at cascading.tap.hadoop.io.MultiInputFormat$1.operate(MultiInputFormat.java:253) at cascading.tap.hadoop.io.MultiInputFormat$1.operate(MultiInputFormat.java:248) at cascading.util.Util.retry(Util.java:762) at cascading.tap.hadoop.io.MultiInputFormat.getRecordReader(MultiInputFormat.java:247) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:172) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:414) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1469) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

galarragas commented 10 years ago

Can you share the code causing the exception? I will have a look at it...

Thanks

du291 commented 10 years ago

@galarragas Can't share the code entirely, but this is the core of what it does with hbase. Also note we don't run against a genuine hbase, but against MapR M7 table. (It worked before upgrades with spyglass 2.10_0.10_4.3 and hbase 0.94.17-mapr-1403-m7-3.1.0)

HBaseSource( tableName = args("input"), sourceMode = SourceMode.SCAN_RANGE, startKey = 'foo\0', stopKey = 'foo\377', keyFields = 'key, familyNames = List("a"), valueFields = List('flow) ) .read .fromBytesWritable('key, 'flow) .toTypedPipeString ... etc

du291 commented 10 years ago

Here's a minimized case

$ rpm -qi mapr-hbase Name : mapr-hbase Relocations: / Version : 0.98.4.27323.GA Vendor: MapR Technologies, Inc., support@maprtech.com $ rpm -qi mapr-core Name : mapr-core Relocations: / Version : 4.0.1.27334.GA

$ hbase shell hbase(main):003:0> create '/user/mukl/spyglasscrash', {NAME => 'a' } 0 row(s) in 0.2210 seconds

=> Hbase::Table - /user/mukl/spyglasscrash $ ls -l mapr/spyglasscrash lr--------. 1 mukl mapr 2 Oct 6 12:29 mapr/spyglasscrash -> mapr::table::2059.7313.15495392

// note table is empty

then run this scalding job

class SpyglassCrash(args: Args) extends Job(args) { HBaseSource( tableName = "/user/mukl/spyglasscrash", sourceMode = SourceMode.SCAN_ALL, keyFields = 'key, familyNames = List("a"), valueFields = List() ) .read .write(NullSource) } ... Caused by: cascading.flow.FlowException: step failed: (1/1) nullTap, with job id: job_1412344092031_0053, please see cluster logs for failure messages

Error: java.lang.NullPointerException at parallelai.spyglass.hbase.HBaseRecordReaderBase.setHTable(HBaseRecordReaderBase.java:64) at parallelai.spyglass.hbase.HBaseInputFormatGranular.getRecordReader(HBaseInputFormatGranular.java:373) at cascading.tap.hadoop.io.MultiInputFormat$1.operate(MultiInputFormat.java:253) at cascading.tap.hadoop.io.MultiInputFormat$1.operate(MultiInputFormat.java:248) at cascading.util.Util.retry(Util.java:762) at cascading.tap.hadoop.io.MultiInputFormat.getRecordReader(MultiInputFormat.java:247) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:172) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:414) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1469) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

here's pom-deps

    <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-library</artifactId>
        <version>${scala.version}</version>
    </dependency>

    <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-compiler</artifactId>
        <version>${scala.version}</version>
    </dependency>
    <dependency>
        <groupId>com.twitter</groupId>
        <artifactId>scalding-core_2.10</artifactId>
        <version>0.11.2</version>
    </dependency>
    <dependency>
        <groupId>parallelai</groupId>
        <artifactId>parallelai.spyglass</artifactId>
        <version>2.10_0.10_CDH5_4.4</version>
        <exclusions>
            <exclusion>
                <groupId>org.apache.hbase</groupId>
                <artifactId>hbase</artifactId>
            </exclusion>
        </exclusions>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-core</artifactId>
        <version>2.4.1-mapr-1408</version>
    </dependency>

    <dependency>
        <groupId>org.apache.hbase</groupId>
        <artifactId>hbase-client</artifactId>
        <version>0.98.4-mapr-1408-m7-4.0.1</version>
    </dependency>
2.10.4

It just seems it does not like the table at all.