ebeshero / DHClass-Hub

a repository to help introduce and orient students to the GitHub collaboration environment, and to support DH classes.
GNU Affero General Public License v3.0
27 stars 27 forks source link

Network analysis exercise part 1 #297

Closed Samantha-Mcguigan closed 7 years ago

Samantha-Mcguigan commented 7 years ago

This is what I have so far and it won't let me eval anything:

xquery version "3.0";
declare default element namespace "http://www.tei-c.org/ns/1.0";
declare variable $decameron := doc('/db/decameron/engDecameronTEI.xml');
declare variable $people := $decameron//persName/tokenize(string(), ' ');
declare variable $distinctPeople  := distinct-values($people);
for $p in $distinctPeople
let $peers:=
               if ($people[. = $distinctPeople]/ancestor::div[1] = $p) 
                              then distinct-values(div[1]//persName/tokenize(string(), ' '))
         else if ($people[. = $distinctPeople]/ancestor::floatingText = $p) 
                              then distinct-values(floatingText//persName/tokenize(string(), ' '))
         else (distinct-values($people[. = $distinctPeople] = $p))

      let $edgeType:=
         if (ancestor::div[1]) 
               then "novella"
         else if (ancestor::floatingText) 
               then "floatingText"
         else "frame"  

    for $peer in $peers
    return
         concat($p, "	", $edgeType, "	",$peer, "
") 

I am sure it has something to do with how I have set up my XPath expressions but I'm not sure what could be the problem with them.

ebeshero commented 7 years ago

@Samantha-Mcguigan The problem is likely right here: if ($people[. = $distinctPeople]/ancestor::div[1] = $p)

Think about what that is literally doing...the computer is certainly really stumped here...

ebeshero commented 7 years ago

@Samantha-Mcguigan Think about what you need to checking. I think you want to walk down the tree, and find where a particular node matches your current $p (which is a member of the distinct-values list and not on the tree).

It looks like you wanted to use $people to be your "tree walker" variable, but notice that that variable stops on a tokenized string--which means it, too, steps OFF the tree to yield a little string of text. You want to find where your $p is equal to that little tokenized string of text, so you can explain that equivalence in a predicate expression. But you need to set up your "tree walker" a little differently from the $people variable. Does that make sense?

ebeshero commented 7 years ago

@Samantha-Mcguigan Also, I am scratching my head about your $people variable, which I think really doesn't need a tokenize function on it (does it?) Why you are trying to tokenize the element content of <persName> there? When I take distinct-values of all the persNames in that file (a total of 83 distinct names), I do not see any that have white spaces in them. When we use the persName element to surround a name, it might contain a first and a last name, but the white space would simply be part of the name string. I think you might be trying to apply what we did with the Hamilton project's attribute values to your code here--and the two projects aren't coded the same way.

The Decameron project's code didn't develop a personography as far as I remember, so they were not making single string @ref attributes like the Hamilton project is doing. When we have a plural list of attributes, we separate those with white spaces, but that is not something you need to work with for the Decameron code, is it?

Samantha-Mcguigan commented 7 years ago

I think I'm starting to understand. I have this now:

xquery version "3.0";
declare default element namespace "http://www.tei-c.org/ns/1.0";
declare variable $decameron := doc('/db/decameron/engDecameronTEI.xml');
declare variable $people := $decameron//persName;
declare variable $distinctPeople  := distinct-values($people);
for $p in $distinctPeople
let $peers:=
               if (div[1]//persName = $p) 
                              then distinct-values(div[1]//persName)
         else if (floatingText//persName = $p) 
                              then distinct-values(floatingText//persName)
         else (distinct-values($people))

I took out all the tokenize functions. Also I know I have to walk down the tree to get the persNames based on the div they are sitting in right? but I am getting an error on if (div[1]//persName = $p) It doesn't like the div[1] but I don't know how to fix that.

ebeshero commented 7 years ago

@Samantha-Mcguigan Good, this is clearer and simpler now! :-) Okay, so with the Decameron project, you want to reach back to the first <div> that is the ancestor of the current persName you're on in the tree. So you want to look up the ancestor axis just one stop.

ebeshero commented 7 years ago

@Samantha-Mcguigan The idea here is to find out which KIND of div that persName is sitting inside...let me check what I wrote on the assignment sheet about this to see if I can explain it more clearly.

ebeshero commented 7 years ago

@Samantha-Mcguigan Here's the relevant part of the assignment sheet: "In our example from The Decameron we output three different words to indicate whether an interaction occurred in floatingText, in the outer frame around the stories, or inside the stories themselves."

This is to describe the kind of interaction we are seeing (a bridge or edge connection). Decameron is a layered narrative, which looks like this structurally: http://decameron.newtfire.org/boxModel.html So we want to see at what narrative level our persName elements occur if you are following our example in the assignment. To get a sense of this, try doing some exploratory document analysis: run some queries on persNames and look at the first ancestor elements: what do you see?

Samantha-Mcguigan commented 7 years ago

I'm getting results now! I have:

xquery version "3.0";
declare default element namespace "http://www.tei-c.org/ns/1.0";
declare variable $decameron := doc('/db/decameron/engDecameronTEI.xml');
declare variable $people := $decameron//persName;
declare variable $distinctPeople  := distinct-values($people);
for $p in $distinctPeople
let $peers:=
               if (//persName[parent::div] = $p) 
                              then distinct-values(//div[1]//persName)
         else if (//floatingText//persName = $p) 
                              then distinct-values(//floatingText//persName)
         else (distinct-values($people))

      let $edgeType:=
         if (//div[1]) 
               then "novella"
         else if (//floatingText) 
               then "floatingText"
         else "frame"  

    for $peer in $peers
    return
         concat($p, "&#x9;", $edgeType, "&#x9;",$peer, "&#10;") 

I had a couple missing // and the computer was mad at me. I am getting 5509 results. Is that too many or does that seem right?

ebeshero commented 7 years ago

@Samantha-Mcguigan The conditional statement for floatingText looks right. I'm realizing this is hard b/c you're not actually working on this project! (It was "hot" like the Hamilton project last spring, but now it's gone dormant a bit and we're not as familiar with it...) So here are some things to be aware of:

You aren't looking for immediate parent:: elements for your persNames (because those are paragraphs or quotes). What you want is actually to go hunting to see: 1) whether there is a <floatingText> ancestor: If there is, it's in one of those nifty nested stories-within-a-story. (stories coded within <floatingText> are the most deeply nested of stories.) 2) otherwise it's going to be framed by a <div> element with an @type attribute that indicates the level of the story you're in. So if it does not have a floatingText ancestor, check the type attributes on the first ancestor div. (You don't want any other ancestor divs, because every story is nested in a novella, and that novella is nested in a frame, all the way back up to the div that surrounds the entire document!)

Does that make sense?

ebeshero commented 7 years ago

@Samantha-Mcguigan This is probably causing too much output:

let $edgeType:=
         if (//div[1]) 
               then "novella"

You're looking down the tree from the document node here. Instead, to get the current shared context, you need to look up the tree from the point of view of a persName on your tree that equals the current $p in your for-loop. Look up at the first ancestor, and if it is floatingText, output "floating text, and otherwise, I think you can output the @type on its first ancestor <div>!

To get the peers, you want to look up from $p, stand on the context (if ancestor::floatingText, then that, or else its ancestor::div[1]), then look down and collect all the persName elements that are NOT EQUAL to the current $p (using ne or !=)

Samantha-Mcguigan commented 7 years ago

okay i have:

for $p in $distinctPeople
let $peers:=

         if (//floatingText//persName = $p) 
                then distinct-values(//floatingText//persName)
        else if (//persName = $p/ancestor::div) 
                then distinct-values(//persName/ancestor::div)
         else (distinct-values($people))

      let $edgeType:=
        if (//persName=$p/parent::floatingText) 
               then "floatingText"
        else if (//persname=$p/ancestor::div) 
               then div/@type
        else "frame"  

    for $peer in $peers
    return
         concat($p, "&#x9;", $edgeType, "&#x9;",$peer, "&#10;") 

but I'm getting an error that says cannot convert xs:untypedAtomic ('Fiammetta') to a node set

ebeshero commented 7 years ago

@Samantha-Mcguigan The peers are problematic: (See my post just above this.) You are currently getting ALL the persName elements in EVERY floating text (not the floating text that contains $p). Also, you would be returning all the persNames including the one that matches $p. You need to exclude $p--that is, get all the persNames that DO NOT EQUAL (ne or !=) $p.

ebeshero commented 7 years ago

@Samantha-Mcguigan This construction you're using might be causing a problem, but I'm not sure b/c I'm not in a place to test it right now:

//persName=$p/parent::floatingTExt

First, I'd make that ancestor::floatingText (There would only be one of these, and might be in an ancestor relationship.) More significantly, you should probably set up a predicate filter to catch

//persName[. = $p]//ancestor::floatingText

Does that make a difference?

Samantha-Mcguigan commented 7 years ago

i have:

xquery version "3.0";
declare default element namespace "http://www.tei-c.org/ns/1.0";
declare variable $decameron := doc('/db/decameron/engDecameronTEI.xml');
declare variable $people := $decameron//persName;
declare variable $distinctPeople  := distinct-values($people);
for $p in $distinctPeople
let $peers:=

         if (//persName [. = $p]//ancestor::floatingText)
                then distinct-values(//persName[. ne $p]//ancestor::floatingText)
        else if (//persName [.= $p]//ancestor::div[1]) 
                then distinct-values(//persName[. ne $p]//ancestor::div[1])
         else (distinct-values($people))

      let $edgeType:=
        if (//persName=$p//ancestor::floatingText) 
               then "floatingText"
        else if (//persname=$p//ancestor::div[1]) 
               then "novella"
        else "frame"  

    for $peer in $peers
    return
         concat($p, "&#x9;", $edgeType, "&#x9;",$peer, "&#10;") 

I'm still getting the same error

Samantha-Mcguigan commented 7 years ago

I'm sorry if I keep repeating the same mistakes you already told me how to fix, I'm just having a hard time understanding

ebeshero commented 7 years ago

@Samantha-Mcguigan Sorry--got caught up in a meeting! Here are a couple of issues now:

1) your $peers variable is not returning people's names. That is because your XPath is landing on the floatingText element, or would be--there's an odd double slash before you walk up the ancestor axis. You basically just want to return the descendants of that same floatingText element that holds $p, who are NOT $p. Try writing it like this:

if (//persName [. = $p]//ancestor::floatingText)
                then distinct-values(//persName[. = $p]/ancestor::floatingText//persName[. != $p])

Read this carefully: I say, if the <persName> that's $p has a <floatingText> ancestor, then, go to that <persName> that's <$p>, go up to its ancestor <floatingText>, then go back down to get all the other <persName> elements that are NOT equal to <$p>. (And wrap it in distinct-values() so we remove duplicates from the list.) Notice I'm using the value-comparison operator (!=) on that last step because there's just one $p and multiple peers in that section of text.

You need to apply something like that XPath above to each one of your conditional statements that define the $peers variable. Let's start there.

  1. For $edgeType, I think you're not understanding how the Decameron is coded, and that may be a problem for understanding what level of the text you're testing for. Try downloading their TEI file and opening it on oXygen and studying it for a bit. All the text is set in nested <div> elements, and the way you know what level of the text you're in is by looking at the @type on the <div> element. Try maybe a separate XQuery or opening the TEI file in oXygen to run some XPath over it (or look at it in outline view in oXygen) so you can see this. Every <persName> in the Decameron text will have an ancestor <div> somewhere, so you probably won't get anything that is "frame" with this conditional setup. What is the XPath you need to determine if something is in a frame? (Check the values of the @type attributes...or maybe just have your conditional return the @type attribute. Maybe you only need two conditions here, not three...

Notice that you haven't modified your $edgeType conditional statements to have predicates that test if [. = $p] the way you did on the $peers variable. (My hunch is that this is causing that error you're seeing about a node not being equivalent to a xs:untypedAtomic...)

ebeshero commented 7 years ago

@Samantha-Mcguigan About $edgeType: I just did an XPath search on the Decameron TEI file (outside of this XQuery--just did it on the file in oXygen). I went looking for this:

//persName/ancestor::div[1]/@type

That looks down at every persName in the file, and then walks up to its first <div> ancestor, and gets the value of its @type attribute. And then I wrapped that in distinct-values():

distinct-values(//persName/ancestor::div[1]/@type)

I returned 6 different @type values on the div elements. They are: prologue Day introduction novella conclusion epilogue

Now, if in the network analysis I just want to distinguish the main storyline from the various frames, I probably just want to test to see if a <div> is @type="novella" or not. But I could just differentiate among all these divs. Here's what they mean: OUTER FRAME= prologue, introduction, conclusion, and epilogue DAY FRAMES= Day STORIES TOLD BY CHARACTERS= novella STORIES-WITHIN-STORIES = down deep, within the novellas, are some stories told by characters within the novellas, and those are layered inside the <floatingText> element.

Your network analysis might try to see what at levels of text the various characters are connected together, and it should look really interesting! I think you're on the right track but just needed an overview of how the project was coded. I also think you should set up the conditional statements to return the kinds of relationships that make sense to you to visualize. I'd recommend summing them up as three different kinds of relationships: 1) connections made in the outer frame + connections made within the Day frames around each story (that would all make sense as "frame") 2) connections made within the novellas (within the stories told by the characters introduced in the frame). 3) connections made inside the <floatingText> stories-within-the-stories.

dotfig commented 7 years ago

When uploading to CourseWeb, make sure you put your two files into a zipped folder or else it will not upload. @Samantha-Mcguigan @ahunker @bsf15 @jonhoranic

Samantha-Mcguigan commented 7 years ago

@ebeshero I just got back from rehearsal but I think I got it so I going to turn it in. Thank you for all your help, I really appreciate it and sorry again if I was being a pain and not understanding something the first time you said it.

ebeshero commented 7 years ago

@Samantha-Mcguigan You weren't being a pain! I'm glad you figured it out. It's hard working with another team's project files without being on the inside of their code...and I had to remember how they were coding, too!

jonhoranic commented 7 years ago

Just as I was going to download my output to put into cytoscape, I got a sever error on the existDB page. Trying to find out what was wrong, I noticed that my internet connection was disconnected. When I looked into it further? I found ENTIRE HOUSE is down, I'm scambling to try to reset things but hopefully I can get at least half turned in for class and finish it up with debugging. Thankfully using my LTE on my phone I could send this message out, I will report back once I get the Internet connection back.

jonhoranic commented 7 years ago

I have a semi stable connection now, I'll post these files to course web ASAP. I will be working on the connection issues at a later point.