alexander-moore / DS501_Project1

DS501 Project 1
0 stars 0 forks source link

Getting started with the data #2

Open alexander-moore opened 5 years ago

alexander-moore commented 5 years ago

Hi all The data comes in a ~3 Gb json file here: https://www.yelp.com/dataset/download

I found a json to csv here: https://github.com/Yelp/dataset-examples

To make the data into a more approachable form. However I don't know enough python to run it, so let me know if anyone feels inspired to try getting the data to an approachable state for the team

qh2150 commented 5 years ago

If you're on windows you can unpack it using 7zip, you have to decompress it once. It's wrapped in a .gz the first time, then after that is extracted another time with 7zip gets you the json files. I tried using the json to csv python file but it just keeps throwing errors for me.

ethanprihar commented 5 years ago

https://www.datacamp.com/community/tutorials/text-analytics-beginners-nltk

https://towardsdatascience.com/semantic-similarity-classifier-and-clustering-sentences-based-on-semantic-similarity-a5a564e22304

https://towardsdatascience.com/neural-network-embeddings-explained-4d028e6f0526