Open cburch opened 8 years ago
Passthrough module does eventually fail, but I'm not sure if this is a factor of starting to get behind (I have a cron job dropping new data in every 3 minutes, passthrough just emits everything so the index gets 500k larger every 3min until it gets behind in the job).
It took approx 10 runs before I saw it output less records than it should, any other type of job (specifically where it replaces previous data) it takes about 3 on average.
Need to investigate why when running a dedupe job, the input set will cut off early and only process the first x files.
Cluster: CE cluster - we did some minor testing on nst and it was not occuring there Source: XXTestStaging - just an analytics job that has a dedupe step -> JS that just spits out first entry everytime, no special proccessing, nothing time consuming Input: I cp a ~500k record json file into the sources ready folder (via /temp), nothing special, about 6 fields/values (There is a script on host 21 /tmp/test1.sh that will do this for you, cron job at /etc/cron.d/test that will do it every 3 min for extended testing)
Affects:
Error: java.lang.NullPointerException at com.ikanow.aleph2.shared.crud.elasticsearch.services.ElasticsearchCrudService$ElasticsearchBatchSubsystem.getPossibleDeletionRequest(ElasticsearchCrudService.java:1272) at com.ikanow.aleph2.shared.crud.elasticsearch.services.ElasticsearchCrudService$ElasticsearchBatchSubsystem.storeObject(ElasticsearchCrudService.java:1313) at com.ikanow.aleph2.core.shared.services.MultiDataService.batchWrite(MultiDataService.java:268) at com.ikanow.aleph2.analytics.services.AnalyticsContext.emitObject(AnalyticsContext.java:1224) at com.ikanow.aleph2.analytics.hadoop.assets.BatchEnrichmentJob$BatchEnrichmentBaseMapper.lambda$null$8(BatchEnrichmentJob.java:530) at java.util.ArrayList.forEach(ArrayList.java:1249) at com.ikanow.aleph2.analytics.hadoop.assets.BatchEnrichmentJob$BatchEnrichmentBaseMapper.lambda$completeBatchFinalStage$9(BatchEnrichmentJob.java:529) at java.util.Optional.orElseGet(Optional.java:267) at com.ikanow.aleph2.analytics.hadoop.assets.BatchEnrichmentJob$BatchEnrichmentBaseMapper.completeBatchFinalStage(BatchEnrichmentJob.java:527) at com.ikanow.aleph2.analytics.hadoop.assets.BatchEnrichmentJob$BatchEnrichmentBase.checkBatch(BatchEnrichmentJob.java:282) at com.ikanow.aleph2.analytics.hadoop.assets.BatchEnrichmentJob$BatchEnrichmentBase.cleanup(BatchEnrichmentJob.java:297) at com.ikanow.aleph2.analytics.hadoop.assets.BatchEnrichmentJob$BatchEnrichmentMapper.cleanup(BatchEnrichmentJob.java:581) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:149) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)