This repository contains my solution to the project "Machine learning algorithms with global state" from the BDAPRO class at TU Berlin. (The repo is based on BDAPRO.WS1617)
I'm doing some benchmarking on the cluster and I'm confused about what information I get on the Spark UI. The setting is as following:
1) I'm using a textFileStream as input source
2) I'm copying one file of 1.6GB to hdfs, and Spark recoginzes the new file, but
3) If I check the "Stages" section, I see two stages with input 128MB
4) and if I check the "executors" section, I see the driver that has input 268.4MB
I'm confused about two reasons - firstly that 2 * 128MB != 268.4MB and secondly that I was expecting to see 1.6GB input instead of 268.4
@jeyhunkarimov Hi Jeyhun,
I'm doing some benchmarking on the cluster and I'm confused about what information I get on the Spark UI. The setting is as following:
1) I'm using a textFileStream as input source 2) I'm copying one file of 1.6GB to hdfs, and Spark recoginzes the new file, but 3) If I check the "Stages" section, I see two stages with input 128MB
4) and if I check the "executors" section, I see the driver that has input 268.4MB
I'm confused about two reasons - firstly that 2 * 128MB != 268.4MB and secondly that I was expecting to see 1.6GB input instead of 268.4
Do you have any idea where I'm going wrong?