Closed achintya-kumar closed 7 years ago
Hi, @HorizonNet and @tantalus1984 , I have a question. I am using QuickStart VM. My MapReduce job which imports the Locations.csv into an HBase table runs perfectly fine on my VM. However, when I packaged it and tried running it on a real cluster of 2 nodes, the job gets 'submitted' but remains 'unassigned'. Google and StackOverflow did not help either. Do you know what the problem could be? Thanks! :)
As said yesterday in the workshop, this problem can have multiple causes. The most common are:
Best starting point would be to go through the logs of the job or the role. Normally there should be an error or a warning.
Hi! Thanks for the response. You were right. Upon querying free memory, the node only had less than 1GB free (because we installed all the Hadoop services). This led to not having sufficient memory to even meet the minimum container requirements. We stopped every other service we didn't need for this particular task, including Cloudera Management Service. Then, we reduced the following parameter and it moved eventually:
Thanks!
Below is a short review.
Tasks
Summary:
You're done with this one. Good work.
Greetings, @HorizonNet, @tantalus1984 :) I have implemented a BloomFilter with a bitArraySize=10000, as required.
However, upon checking with
size()
method of the BitSet field, I get an output of 10048. Is it due some kind of overhead in the BitSet data structure?I have spent some time looking into the code for any value that might lead to setting a bit position higher than 10k but did not find anything. The code is also mod-ing the hashes by 10k before adding it to the bitset.
What do you think could be the problem?
Here's the implementation for your reference: https://github.com/achintya-kumar/BD2017/blob/master/labs/3-hbase-spark/2-locations/src/main/java/bloomFilter/BloomFilter.java