Crawler broken? - Githubissues

yashha commented 8 years ago

I noticed for some time now that the crawler is not running.

sacdallago commented 8 years ago

Also noticed.

@marcusnovotny @julienschmidt ?

marcusnovotny commented 8 years ago

Does this happen for all characters?

No idea what happened. Just restart the app

julienschmidt commented 8 years ago

Unfortunately both my backdoor and my crystal ball are broken 😉

sacdallago commented 8 years ago

So let me get another couple questions, maybe we can figure this out together :)

The server crashed and I have to move all the website and things to another instance
I didn't copy the CSV data (because I thought it would generate itself from the database) and just linked the DB.
I noticed that the twitter graphs weren't working, so I recovered an old email @marcusnovotny send me with some CSV, copied them to the CSV folder on the server and bam: everything was "back to normal"

This is already something I don't quite understand... What do you need the CSVs for? Is it telling the app that they should look for the data on the database?

Now second thing is:

I log into the db to see if there are new tweets, I just performed a find one on charactersentiments:

 db.charactersentiments.findOne()
{
    "_id" : ObjectId("570179d679429d1e795c2492"),
    "name" : "Robb Reyne",
    "slug" : "Robb_Reyne",
    "total" : 22,
    "positive" : 0,
    "negative" : 0,
    "popularity" : 0,
    "heat" : 22,
    "updated" : ISODate("2016-05-08T13:06:34.475Z")
}

As you see, there is an update that dates a couple days back (5, when the server crashed).
I look for the guy on got.show ( https://got.show/characters/Robb%20Reyne ) but he has no twitter graph (although there should be 22 useless ones :D )
I search for someone else, hyper:

> db.charactersentiments.findOne({"name":"Petyr Baelish"})
{
    "_id" : ObjectId("570179d679429d1e795c2a63"),
    "name" : "Petyr Baelish",
    "slug" : "Petyr_Baelish",
    "total" : 21599,
    "positive" : 7177,
    "negative" : 3349,
    "popularity" : 3828,
    "heat" : 21599,
    "updated" : ISODate("2016-05-12T08:51:49.124Z")
}

I see updated today (which is good), though on the website https://got.show/characters/Petyr%20Baelish no updates since Apr 10

Any ideas?

julienschmidt commented 8 years ago

I didn't copy the CSV data (because I thought it would generate itself from the database) and just linked the DB.

Yes it should do so on the first start and on every update to that character afterwards.

This is already something I don't quite understand... What do you need the CSVs for? Is it telling the app that they should look for the data on the database?

Because aggregating the tweets on-access is not an option. It involves some heavy I/O in both the DB itself and transferring the tweets to the server for analysis. The CSVs are some kind of cached result. The server then just has to serve static files. Remember the difference between the cached and the non-cached API (Your LRZ guy showed my some graphs yesterday)? The difference here should be a few magnitudes larger 😉

My assumption is, that the crawler can not write the CSVs for some reason. Check file permissions, logs etc.

sacdallago commented 8 years ago

@julienschmidt checked :( Seems to be all working! Pffffff

Maybe it makes sense to run an instance locally with the same DB and at times copy over the CSVs?

yashha commented 8 years ago

status? @julienschmidt @sacdallago :)

sacdallago commented 8 years ago

Fixed (was not an ez-pz problem to solve)

Rostlab / JS16_ProjectF

Crawler broken? #516