Closed kordianbruck closed 8 years ago
Yeah, we, group2, agree.
mysql dump can be fetched from https://www.dropbox.com/s/uomk7vsl94fc3b4/WoIaFDB20160228.sql.gz?dl=0
Did you manage to install the db? are you done populating your db?
Just adding to Guy's comments: we are trying really hard to get you the data also on an instance that you can access, but it's much harder than you would think. And it's not like we can't set up a VM in 2 minutes, it's all the administrative crap. But we'll keep you updated. For now get a docker image of mysql, run a local instance, copy the dump in and try to get the data from there. Sorry guys!
Well, we managed to scrap the wikipages and to fill the database with houses (without references to other entities).
@kordianbruck: Please give me the credentials to the db of your server or run a GET on api/dbFiller/houses to fill the db, that you set online for us. DELETE on api/dbFiller/houses clears the collection.
We really need the help of the others on this task! This is no task for only two guys.
Especially, i have already implemented really much (All stores and controllers for the API calls, the API doc comments, worked on the models,...). Now, also doing all the scrapper and population stuff with @theocheslerean alone is really a overload for me.
@Adiolis I've just updated the repo, docs and ran the endpoint.
https://got-api.bruck.me/api/houses/ already returns a good amount of them.
I'm still unsure if this should be a public API at all. If people can come at random and delete the collection or rerun the import then it might come close to a DDOS of my server. For now I've disabled the routes to be accessed from the web.
I introduced now caching of the wiki scrapping results. Cachefile: wikiData/houses.json
Code is now way smoother and faster.
I'm very impressed :smile: I haven't had time to look at how you implemented it, but if you haven't done this: make sure that you put a TTL on the data, so that if ever it get's updated, at some stage a request will update it!
@sacdallago : I will implement it =)
Niceee!!!!! :) :+1: @Adiolis
Regions, characters, episodes and houses fillers are implemented =D
@kordianbruck Please perform the fillers on your server ;) (For the routes please look into routes.js) Characters will take really long (> 10min.), because there are > 2400 of them. Also check the new cfg property ;)
Still further properties like all references to other entitites and some that the scrapper is not yet handling have to be implemented.
Btw #38 just for the sake of it. And I'm gonna un-assign me, otherwise I get crazyyyyy :D :D
Episodes gives me the following error:
Problem:ValidationError: CastError: Cast to Date failed for value "June 15th, 2014" at path "airDate" Problem:ValidationError: CastError: Cast to Date failed for value "April 19th, 2015" at path "airDate" Problem:ValidationError: CastError: Cast to Date failed for value "May 4th, 2015" at path "airDate" Problem:ValidationError: CastError: Cast to Date failed for value "April 12th, 2015" at path "airDate" Problem:ValidationError: CastError: Cast to Date failed for value "April 26th, 2015" at path "airDate" Problem:ValidationError: CastError: Cast to Date failed for value "May 10th, 2015" at path "airDate" Problem:ValidationError: CastError: Cast to Date failed for value "May 17th, 2015" at path "airDate"
Nevermind, the characters worked now after the third try. All imports should be done now. The error above only affected a few episodes.
@kordianbruck it is always a good idea to put the parsing/casting of dates in a try/catch, as no one follows standards, ever :wink: it is worse to have the server crash, than have a null object in a document!
airDate is newly introduced by the scrapper. The filler is not yet transforming the date into the required type. Also not ignoring it, because the property is in the model. Needs to be done.
@adiolis can u open up an issue for this if you haven't done so already? :)
@sacdallago Done. I think, i finished the characters filling with all details.
@kordianbruck Please clear the characters collection and start filling it again ;)
Please load into your pub API. I wanna take a look.
Sent from my iPhone
On Mar 5, 2016, at 12:20 PM, Michael Legenc notifications@github.com wrote:
@sacdallago Done. I think, i finished the characters filling with all details.
— Reply to this email directly or view it on GitHub.
@gyachdav done, just updated the server.
We still need the others to help us on this task... @kordianbruck @togiberlin @boriside
Wiki scraper all done
@sacdallago @gyachdav
We are currently trying to scrape data off the wiki into our database but it is awfully slow and really not a practicable method. What is the status of the database dump you promised?
In addition many data fields are not consistent across individual pages - any idea how to approach this?
Issues #5 #2 #20 are relying on a fix on this