WikiMapper / WikiViz

An interactive, real-time semantic visualizer of Wikipedia
5 stars 5 forks source link

Create URLs database #15

Open matseng opened 10 years ago

matseng commented 10 years ago

@redwoodfavorite @FarhadG

Table 1: urls -Columns of URLs, title, html, etc.

Table 2. url_to_url

FarhadG commented 10 years ago

@matseng

I was under the impression that we were having two tables:

Table 1: A single url mapped to an array of Urls on that page


URL || URLS


Table 2: A low-level mapping of a single url mapped to an outgoing URL on that page


URL || URL


Should we input the Title & HTML into a column within the first table? Or would it be better to have them in a separate table (not sure why, but perhaps I could be overlooking an advantage to doing so)

autumnfjeld commented 10 years ago

@redwoodfavorite @FarhadG @matseng

Michael's setup makes sense to me: table 2 is a join table that identifies the parent-child relationships.

In the simplest form of our app, here is what I picture the client asking the server/database (I'm going to post a more detailed issue on this):

Does this sound correct to you guys?

matseng commented 10 years ago

Autumns description is spot on.

Just to clarify, Table 2 contains all URL relationships as Ids (e.g. url_id_abc --> url_id_xyz). Then we can use JOIN operations to get richer data from Table 1 (e.g. URL, title, html).

From Table 2, we can also count the number of children and parents for a given node and store that info back in Table 1. I've been working of these database operations, so I can definitely help further.

On Thu, Feb 20, 2014 at 10:14 AM, Autumn notifications@github.com wrote:

@redwoodfavorite https://github.com/redwoodfavorite @FarhadGhttps://github.com/FarhadG @matseng https://github.com/matseng

Michael's setup makes sense to me: table 2 is a join table that identifies the parent-child relationships.

In the simplest form of our app, here is what I picture the client asking the server/database (I'm going to post a more detailed issue on this):

  • User input: wikipediaurl
  • GET request to table 1, check if exits
  • if yes, GET request to table 2 to get all parent & children urls
  • For each parent & child GET request to table 1 to get title, summary, etc
  • write all this into array of objects and send to d3

Does this sound correct to you guys?

Reply to this email directly or view it on GitHubhttps://github.com/WikiMapper/WikiViz/issues/15#issuecomment-35651546 .