SiRumCz / CSC501

CSC501 assignments
0 stars 1 forks source link

Word cloud not working for 27M dataset. #32

Open superliuxz opened 5 years ago

superliuxz commented 5 years ago

I checkout the master branch, and load the 27M dataset through preproc.py. The word cloud graph however is not showing anything.

It works on the small testing dataset.

@soroushysfi feel free to ask for help, and please let me know if we can fix it or not. If not, I will need to look for alternative solution ASAP.

superliuxz commented 5 years ago

@soroushysfi @SiRumCz

With GH-36 in place, I seems the word cloud is working; however in GH-36, the weights (counts) are returned as floats, all the words become too small to see:

Screen Shot 2019-09-25 at 12 25 38 PM

Then I multiple each count by 100. I get:

Screen Shot 2019-09-25 at 12 32 02 PM

It's getting better but no where close to what I had in GH-36. There is too much empty space between each word.

@soroushysfi you are the expert on it, can we improve on this? Would it be too much effort (i.e. D3's wordcloud component does not support such config)?

If not, we can make an exception, and hand in what we had in GH-36 (so instead of a dynamic component, we present with a static image). I will supply with my Python code as well.

soroushysfi commented 5 years ago

@soroushysfi @SiRumCz

With GH-36 in place, I seems the word cloud is working; however in GH-36, the weights (counts) are returned as floats, all the words become too small to see:

Screen Shot 2019-09-25 at 12 25 38 PM

Then I multiple each count by 100. I get:

Screen Shot 2019-09-25 at 12 32 02 PM

It's getting better but no where close to what I had in GH-36. There is too much empty space between each word.

@soroushysfi you are the expert on it, can we improve on this? Would it be too much effort (i.e. D3's wordcloud component does not support such config)?

If not, we can make an exception, and hand in what we had in GH-36 (so instead of a dynamic component, we present with a static image). I will supply with my Python code as well.

Are you running this on branch: word-cloud-data-normalization ?

soroushysfi commented 5 years ago

@soroushysfi @SiRumCz

With GH-36 in place, I seems the word cloud is working; however in GH-36, the weights (counts) are returned as floats, all the words become too small to see:

Screen Shot 2019-09-25 at 12 25 38 PM

Then I multiple each count by 100. I get:

Screen Shot 2019-09-25 at 12 32 02 PM

It's getting better but no where close to what I had in GH-36. There is too much empty space between each word.

@soroushysfi you are the expert on it, can we improve on this? Would it be too much effort (i.e. D3's wordcloud component does not support such config)?

If not, we can make an exception, and hand in what we had in GH-36 (so instead of a dynamic component, we present with a static image). I will supply with my Python code as well.

I'v fixed it. It's on the branch word-cloud-data-normalization. Let me know if you're satisfied with it.

superliuxz commented 5 years ago

@soroushysfi @SiRumCz With GH-36 in place, I seems the word cloud is working; however in GH-36, the weights (counts) are returned as floats, all the words become too small to see: Screen Shot 2019-09-25 at 12 25 38 PM Then I multiple each count by 100. I get: Screen Shot 2019-09-25 at 12 32 02 PM It's getting better but no where close to what I had in GH-36. There is too much empty space between each word. @soroushysfi you are the expert on it, can we improve on this? Would it be too much effort (i.e. D3's wordcloud component does not support such config)? If not, we can make an exception, and hand in what we had in GH-36 (so instead of a dynamic component, we present with a static image). I will supply with my Python code as well.

I'v fixed it. It's on the branch word-cloud-data-normalization. Let me know if you're satisfied with it.

What is word-cloud-data-normalization branch? Did you create it to support GH-36, or to fix this issue ticket? Whichever it is you should create a PR with clear intent so the rest of us can be aware.