actionml / template-scala-parallel-universal-recommendation

30 stars 21 forks source link

groupAll takes a HUGE amount of time #29

Open pferrel opened 7 years ago

pferrel commented 7 years ago

Running on a large cluster and medium sized data (100Mb) this stage take 9.2 hours, by far the longest phase. Any ideas @laser13 @alexice ? This is not very large data and running on 4 r3.4xlarge AWS instances.

image

image

pferrel commented 7 years ago

here is the old implementation. Should I try putting this back in?

https://github.com/actionml/template-scala-parallel-universal-recommendation/blob/816275b281196cb5cfbbac3a834ba50d61d02d17/src/main/scala/URModel.scala#L162

alexice commented 7 years ago

Yes, it would be good to compare total time and stage time of previous code. Looks wired. Maybe this is because of some laziness and some other calculations were attributed to this line?

On Nov 19, 2016, at 23:51 , Pat Ferrel notifications@github.com wrote:

Running on a large cluster and medium sized data (100Mb) this stage take 9.2 hours, by far the longest phase. Any ideas @laser13 @alexice ? This is not very large data and running on 4 r3.4xlarge AWS instances. We are only using popularity, no random or user-defined ranking.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

Best regards, Alexey Pan'kov e-mail: alexicep@gmail.com phone: +7 981 891 2239