CSCI-4830-002-2014 / challenge-week-12

0 stars 14 forks source link

Hosting Mongo server for data for this week challenge #3

Open ianks opened 9 years ago

ianks commented 9 years ago

Due to the size of the data in this challenge, I figured some people would have some issues getting it indexed/hosted in Mongo in a reasonable time. I went ahead and started a Mongo server with all of the data hosted.

You can access it like so: $ mongo --host 198.199.113.194

ianks commented 9 years ago

cc: @CSCI-4830-002-2014/students

BrianNewsom commented 9 years ago

Thanks Ian! This is great. Also I just used

$ mongo 198.199.113.194

to access it, the -h flag brought up help for my windows machine.

ianks commented 9 years ago

Good call! Updating post now.

indiesquidge commented 9 years ago

It was working moments ago just as mongo 198.199.113.194, but now it won't work at all. I get the error:

MongoDB shell version: 2.6.5
connecting to: 198.199.113.194/test
2014-11-14T13:51:32.511-0700 warning: Failed to connect to 198.199.113.194:27017, reason: errno:61 Connection refused
2014-11-14T13:51:32.511-0700 Error: couldn't connect to server 198.199.113.194:27017
(198.199.113.194), connection attempt failed at src/mongo/shell/mongo.js:148
exception: connect failed

I think the update must have changed something. Are you sure the IP stayed the same?

ianks commented 9 years ago

I just upgraded mongo, for some reason the Debian package was at 2.0, now we at 2.6. Should be good now. Can you confirm it works?

indiesquidge commented 9 years ago

:+1: Thanks!

antsankov commented 9 years ago

Awesome stuff!

JoshFerge commented 9 years ago

you da best

alne4294 commented 9 years ago

Thank you!

mynameisfiber commented 9 years ago

:+1:

dawsbot commented 9 years ago

Can't...

> db.reddit.find({},{}).sort({ups:1})
error: {
        "$err" : "Runner error: Overflow sort stage buffered data usage of 33555127 bytes exceeds internal limit of 33554432 bytes",
        "code" : 17144

        "code" : 17144
}
ianks commented 9 years ago

@dawsonbotsford Try throwing a .limit(10) on there.

mynameisfiber commented 9 years ago

You could always limit it to comments from one particular day in the dataset. This'll add potential biases that you can think about and find ways to get around (maybe redo the calculation for different days and check that the conclusions are approximately the same)

dawsbot commented 9 years ago

Where do I find the weather data on this server?

ianks commented 9 years ago

I don't believe anyone had uploaded that yet. Feel free to do so in a new collection, though.

dawsbot commented 9 years ago

Peyman added it in JSON format in collection weather

dawsbot commented 9 years ago

How are we supposed to convert the csv to json? Peyman and I are dead in the water

BrianNewsom commented 9 years ago

Import it as CSV using --type CSV and --headerline On Nov 16, 2014 6:58 PM, "Dawson Botsford" notifications@github.com wrote:

How are we supposed to convert the csv to json? Peyman and I are dead in the water

— Reply to this email directly or view it on GitHub https://github.com/CSCI-4830-002-2014/challenge-week-12/issues/3#issuecomment-63251551 .

dawsbot commented 9 years ago

Do we need to build our own headerline?

BrianNewsom commented 9 years ago

If you look at the file, the first line is a header...

http://docs.mongodb.org/manual/reference/program/mongoimport/

is what I referenced.

ianks commented 9 years ago

So I ended up booting an instance with 32CPUs and 32GB of memory to deal with these queries....

$ mongo --host 104.236.191.166