Rostlab / JS16_ProjectA

In this project we will lay the foundations for our system by integrating data from multiple sources into a central database. The database will serve the apps and the visualization tool that will be developed in other projects.
GNU General Public License v3.0
28 stars 14 forks source link

House fell hourse templeton and house farman in characters list #126

Closed gyachdav closed 8 years ago

gyachdav commented 8 years ago

the character list still contains house names. please clean.

boriside commented 8 years ago

@gyachdav I was on my way back to Munchen, now I will take a look.

boriside commented 8 years ago

@gyachdav @sacdallago I saw the wrong entries.. Should we provide "removeById" functionality in the api?

gyachdav commented 8 years ago

can you investigate how they ended up as characters in the first place? Would be great if we can correct the scraper and refill the db with a corrected list.

boriside commented 8 years ago

yes, I already work on it.

sacdallago commented 8 years ago

Good. I don't think that removing items would do any good, as the policy of refilling would later put them in again. As @gyachdav suggested, it's better to solve the problem at the root, that is: the wiki. I know it's a big effort, but let's be consistent and let's hope the wiki gets better and better.

Legenzoo commented 8 years ago

What needs to be inspected is the getAllNames function of the characters scraper. This is later used to determine the wiki pages to scrape. There seems to be picked up too much. I am not sure, if i have time do this, because of death in my girlfriend´s family.

@theocheslerean

boriside commented 8 years ago

@Adiolis I do it right now.

Legenzoo commented 8 years ago

@boriside The problem is that the scraper is buggy as f**ck. Some characters are even not scraped... Many properties not scraped at all.

sacdallago commented 8 years ago

@Adiolis I'm sorry to hear about the loss :disappointed: I hope you and your girlfriend are keeping up.

Let some other people take some of your workload, @boriside thanks for doing that already. I'm also talking about the "page rank" issue.

boriside commented 8 years ago

@Adiolis It's not because the scrapper, but because the inconsistent information in the wiki. SO I think the best approach would be to exclude the exceptions. I also sorry about the loss, take your time. :(

Legenzoo commented 8 years ago

Thank you both. It happend two days ago =/. I am taking my time but this is just a quick fix of 5 minutes... Feel free to edit my solution, whatever.

The scraper was taking all links instead only the first.

sacdallago commented 8 years ago

@boriside excluding data is a bold move :) I wouldn't do that, personally. Incorrect data is better than eventually no data. Wikis are a joint effort, thus mistakes are acceptable and encourage users to curate data.

@Adiolis should I run an update on characters?

Legenzoo commented 8 years ago

I fixed the bug, that the scraping could not terminate because of the skipping of the houses. Now, everything should be fine.

sacdallago commented 8 years ago

Thx