Closed zqhxuyuan closed 8 years ago
After use yarn log to see what happen inside, It's the problem of classpath of cassandra and guava.. I have done this, and finally find our cql has a double quote field: timestamp:
CREATE TABLE velocity (attribute text,partner_code text,app_name text,type text,"timestamp" bigint,event text,sequence_id text,PRIMARY KEY ((attribute), partner_code, app_name, type, "timestamp")) WITH compression={'sstable_compression': 'LZ4Compressor'}
I tried use backslash : \"timestamp\". but Exception happend:
15/11/08 19:13:50 INFO mapreduce.Job: Task Id : attempt_1446657831952_0037_m_000005_0, Status : FAILED
Error: java.lang.NullPointerException
at com.fullcontact.sstable.example.JsonColumnParser.getColumnValueConvertor(JsonColumnParser.java:55)
at com.fullcontact.sstable.example.JsonColumnParser.serializeColumns(JsonColumnParser.java:87)
at com.fullcontact.sstable.example.JsonColumnParser.getJson(JsonColumnParser.java:37)
at com.fullcontact.sstable.example.SimpleExampleMapper.map(SimpleExampleMapper.java:57)
at com.fullcontact.sstable.example.SimpleExampleMapper.map(SimpleExampleMapper.java:21)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
I found null object is CFDefinition, so I add rebuild before getCfDef
public JsonColumnParser(final CFMetaData cfMetaData) {
cfMetaData.rebuild();
this.cfd = cfMetaData.getCfDef();
System.out.println("CFDefinition:" + cfd);
this.columnNameConverter = cfMetaData.comparator;
}
now cfdef is not null:
2015-11-09 10:43:56,968 INFO [main] com.fullcontact.sstable.example.JsonColumnParser: CFDefinition:attribute, partner_code, app_name, type, timestamp => {event, sequence_id}
and reading is ok,
2015-11-09 10:43:56,998 INFO [main] com.fullcontact.sstable.example.JsonColumnParser:
...
2015-11-09 10:43:56,998 INFO [main] com.fullcontact.sstable.example.JsonColumnParser: columnName:event,colId:event,cfd:attribute, partner_code, app_name, type, timestamp => {event, sequence_id}
2015-11-09 10:43:56,999 INFO [main] com.fullcontact.sstable.example.JsonColumnParser: columnName:sequence_id,colId:sequence_id,cfd:attribute, partner_code, app_name, type, timestamp => {event, sequence_id}
but some rows has problem
2015-11-09 10:43:57,880 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.cassandra.serializers.MarshalException: String didn't validate.
at org.apache.cassandra.serializers.UTF8Serializer.validate(UTF8Serializer.java:35)
at org.apache.cassandra.db.marshal.AbstractType.getString(AbstractType.java:154)
at com.fullcontact.sstable.example.JsonColumnParser.serializeColumns(JsonColumnParser.java:93)
@zqhxuyuan Are which branch of hadoop-sstable are you working from?
@zqhxuyuan in your hadoop.sstable.cql
CREATE TABLE statement, I believe you want just timestamp
, with no quotes at all.
I'm use branch cassandra-2.0.x. after make little change, I solve the problem:
1.double quotes keyword: timestamp
as timestamp is keyword in C*, create table without double quote will failed, so I use other special char like $ at hadoop.sstable.cql: $timestamp$, and in mapper setup: cql = cql.replace("&", "\"");
this can work.
2.UTF8 validate JSONColumnParser.serializeColumns when get column value use C* type conversion cause validate exception, so I just change byteBuffer to String can work:
ByteBuffer buffer = column.value();
String content = byteBufferToString(buffer);
String json = JSONObject.escape(content);
LOG.info("JSON: {}", json);
sb.append(json);
public static String byteBufferToString(ByteBuffer buffer) {
CharBuffer charBuffer = null;
try {
CharsetDecoder decoder = Charset.forName("UTF-8").newDecoder();
decoder.onMalformedInput(CodingErrorAction.IGNORE);
charBuffer = decoder.decode(buffer);
buffer.flip();
return charBuffer.toString();
} catch (Exception ex) {
ex.printStackTrace();
return null;
}
}
offcourse suppose the charset is UTF8 in our system.
3.CFDefinition NPE
as I post previous comment, I add cfMetaData.rebuild() at JsonColumnParser constructor before cfMetaData.getCfDef()
Hi, I use hadoop2.4.1 and cass2.0.15, running SSTableIndexIndexer is ok, but running SimpleExample has problem:
my hadoop env is ok,because running example like wordcount is ok. the log indicate map 0% and redcue %, means mapper not being called at all. but why? I really don't know
when I run cmd without
-D hadoop.sstable.cql
, no exception:Failed CQL create statement empty
happen, which should be, as SimpleExampleMapper catch this exceptionAnd I debug code find reading input: SSTableRowInputFormat is normal.
PS: for classpath problem running together with cassandra. I export classpath then run hadoop jar