DonFlat / ddps_assn1

Reproducibility Study
0 stars 0 forks source link

Question to ask at Q&A #6

Open DonFlat opened 1 year ago

DonFlat commented 1 year ago

Questions

  1. How to limit the memory usage Spark? config
  2. How to kill a node while Spark/Hadoop is running? -c + name / top command
  3. Total time spent by all maps - Does it mean t(map1) + t(map2) or t(start of earlier of two - end of later of two) / linear, from start to end slots?
  4. The glossary is not consistent - sometimes it is using node, but also use machine, is this not OK?
  5. For page rank, is it really average? It’s not mentioning - Do not need to find out
  6. One more experiment about PageRank: with Pregel (What the heck does this mean?)
  7. How to count the repetition times? Different participants, total 3 times, is it 3 times? - no need to find out 20 rep for each
  8. Watch out for the experiment participants!!!! The description were copied from the article, should we add reference there??? Paraphrasing would be okay.
  9. The algorithm of pi program in Spark and Hadoop look different. What parameter to make them same?
  10. Perhaps a handy tool for Boxplot? PYthon
AriaTian commented 1 year ago

no need for high consistency. Yarn vs Spark submit, java vs python. You can analyze, but really no need.

如何优化: 语言 重复次数没有明确 pagerank average

20 rep for each

研究:hibench

控制变量 实验设计 数据分布