ebeshero / DHClass-Hub

a repository to help introduce and orient students to the GitHub collaboration environment, and to support DH classes.
GNU Affero General Public License v3.0
27 stars 27 forks source link

Network Analysis #771

Closed amberpeddicord closed 4 years ago

amberpeddicord commented 4 years ago

@ebeshero I'm trying to work on a network graph for Teen Titans that shows character interactions (so, I'm trying to get the distinct speakers, the number of times they speak, the distinct values of the strings inside <char> elements, and the number of times each distinct char element is spoken by each distinct speaker...if that makes sense). I have this so far:

xquery version "3.1";
declare variable $s1Coll as document-node()+ := collection('/db/teentitans/season1');
let $spkr := $s1Coll//spkr[contains(@ref, 'Robin') or contains(@ref, 'Raven') or contains(@ref, 'Cyborg') or contains(@ref, 'Starfire') or contains(@ref, 'BeastBoy')][not(contains(@ref, 'Puppet')) and not(contains(@ref, 'Aqualad')) and not(contains(@ref, 'Pink')) and not(contains(@ref, 'Green')) and not(contains(@ref, 'Gray')) and not(contains(@ref, 'White')) and not(contains(@ref, ' '))]/@ref ! tokenize(., '#')[last()]
let $CountSpeeches := 
    let $distSpkr := distinct-values($spkr)
    for $d in $distSpkr
    let $countSpeech := $s1Coll//sp[spkr[@ref ! substring-after(., '#') = $d]] =>       count()
    return $countSpeech
(:  :let $maxCountSpeeches := $CountSpeeches => max():)
let $distSpkr := $spkr => distinct-values()
for $d in $distSpkr
let $speech := $s1Coll//sp[spkr[@ref ! substring-after(., '#') = $d]]
let $countSpeech := $speech => count()
let $char := $speech//char[contains(., $d)]/string() ! tokenize(., "'")[1] 
let $distChar := $char => distinct-values()
for $c in $distChar
let $mention := $char[ancestor::spkr[@ref = $d]]
let $mentionCount := $mention => count()
return concat($d, ',', $countSpeech, ',', $c, ',', $mentionCount)

I keep getting an error message that refers to $mention (though I'm sure that this isn't the only error in this code) that says that it cannot convert xs:string to a node set. What changes do I need to make to get this to work?

ebeshero commented 4 years ago

@amberpeddicord Okay, I pasted your code into our newtfire eXide, and I see some issues before that point that's giving you an error. Let's take a close look at this for-loop in your code:

for $d in $distSpkr
let $speech := $s1Coll//sp[spkr[@ref ! substring-after(., '#') = $d]]
let $countSpeech := $speech => count()
let $char := $speech//char[contains(., $d)]/string() ! tokenize(., "'")[1] 

The $d in $distSpkr is fine, and returning what we want. So is $speech and $countSpeech. But take a look at $char: is that really what you mean to say? That variable literally is saying: Go down and find all the <char> elements that contain the speaker you're on in this for-loop. I need to look at the <char> elements to see how they're configured again, but from what I can tell when I tinker with your code, that's constraining your results so that you only see self-mentions, when Robin talks about Robin.

ebeshero commented 4 years ago

There's more--but let's start there! Can you post some of the source code here so we can see how <char> relates to <sp>? I'm cooking dinner and will take a closer look when I come back a little later...

amberpeddicord commented 4 years ago

@ebeshero Yeah, to be honest I'm not entirely sure what I'm trying to say when writing this code! Here's an example of a <sp> element:

<sp>
                <spkr ref="Starfire">Starfire</spkr> Joyous greeting, friend! <sd>as a tentacle snakes
               into view</sd> I, <char>Starfire</char>, give you this tinnabula as a symbol of-</sp>
         <sd>It wraps around his neck; she gasps.</sd>
         <sp>
ebeshero commented 4 years ago

One more thing: When I remove that odd predicate on the definition of $char, and use this:

let $char := $speech//char/string() ! tokenize(., "'")[1] 

I see lots of $char values for each $d. I was testing to see what the relationship of $distChar to $d is with this return:

return concat ($d, '!!! ', string-join($distChar, ', '), ': ')
amberpeddicord commented 4 years ago

@ebeshero Should I remove the predicate on $char? I wanted to only do an analysis for the Titans themselves, so I thought that would be the best way to do it.

ebeshero commented 4 years ago

@amberpeddicord I think you may want to remove it, unless I'm misunderstanding something. I basically commented out the later stuff in your code to wrap my head around this part, first of all, so I was running just this, with and without your predicate:

declare variable $s1Coll as document-node()+ := collection('/db/teentitans/season1');
let $spkr := $s1Coll//spkr[contains(@ref, 'Robin') or contains(@ref, 'Raven') or contains(@ref, 'Cyborg') or contains(@ref, 'Starfire') or contains(@ref, 'BeastBoy')][not(contains(@ref, 'Puppet')) and not(contains(@ref, 'Aqualad')) and not(contains(@ref, 'Pink')) and not(contains(@ref, 'Green')) and not(contains(@ref, 'Gray')) and not(contains(@ref, 'White')) and not(contains(@ref, ' '))]/@ref ! tokenize(., '#')[last()]
let $CountSpeeches := 
    let $distSpkr := distinct-values($spkr)
    for $d in $distSpkr
    let $countSpeech := $s1Coll//sp[spkr[@ref ! substring-after(., '#') = $d]] =>       count()
    return $countSpeech
(:  :let $maxCountSpeeches := $CountSpeeches => max():)
let $distSpkr := $spkr => distinct-values()
for $d in $distSpkr
let $speech := $s1Coll//sp[spkr[@ref ! substring-after(., '#') = $d]]
let $countSpeech := $speech => count()
let $char := $speech//char/string() ! tokenize(., "'")[1] 
(: Alternative version with strange predicate: 
let $char := $speech//char[contains(., $d)]/string() ! tokenize(., "'")[1] 
 :)
let $distChar := $char => distinct-values()

return concat ($d, '!!! ', string-join($distChar, ', '), ': ')
ebeshero commented 4 years ago

I have no idea why you're running the tokenize() function on the $char variable either...but I didn't second-guess it. There must be some reason I'm not quite remembering!

ebeshero commented 4 years ago

But maybe you don't need that tokenize() function any more if you've cleaned up your XML code with Schematron? (dimly remembering some things about this from before Break...)

amberpeddicord commented 4 years ago

@ebeshero The files I was using weren't updated, and we haven't gotten as many of the old files updated we would have liked (yet)! So that's why I had the tokenize() function in there.

ebeshero commented 4 years ago

@amberpeddicord Yes, I think I understand what you were trying to do now. If I'm right about this, you wanted to limit the return of <char> values to only the names of Teen Titans characters, because other characters mentioned are not Teen Titans. That's okay, and a good idea for your network graph since it limits the total number of nodes you'll have to work with. However(!) the predicate you were using won't quite work because you've set the values to be equivalent to $d. $d is a single member of the Teen Titans: JUST the one character you are evaluating at one particular turn of the for loop. So, when you're processing the loop for Robin, you only return $char values for Robin (and none of the other Titans).

Instead, I think you really meant to check and see if the <char> is any one of a series of values. It needs to belong to the full value set of $distSpkr, right? So, let's try defining it this way:

let $char := $speech//char/string() ! tokenize(., "'")[1][. = $distSpkr] 

(I just tested this and returned what I suspect is the right output...You try.) Here's why this works: 1) It delimits your <char> values so they'll be comparable to your full sequence of values of $distSpkr (not the single value of $d at a particular turn of the for-loop). 2) The = comparison operator can be used to compare a sequence of things to each other. So each value of your tokenized strings can be checked against your full list of values for $distSpkr.

ebeshero commented 4 years ago

If that's starting to look right, let's move onto the next part to figure out what $mentions is all about...

amberpeddicord commented 4 years ago

@ebeshero Yes, that looks correct! As far as $mentions, I was basically trying to get a count of the number of times characters were mentioned by each other, but I was very lost...

ebeshero commented 4 years ago

Okay...dinner's in the oven now, so let's see if we can figure this out while it's baking...

ebeshero commented 4 years ago

@amberpeddicord I think I figured it out. Take a look at this inner for loop and redefinition of $mention:

let $distChar := $char => distinct-values()
for $c in $distChar
let $mention := $s1Coll//sp[spkr[@ref ! substring-after(., '#') = $d]]//char[contains(., $c)] 
let $mentionCount := $mention => count() 
return concat('Speaker ', $d, ' speaks ', $countSpeech,  ' times, and refers to ', $c, ' ', $mentionCount, ' times.') 

Let's take a close look at the predicates I set up for $mention:

First I start on the collection, and step down to <sp>. I filter those to make sure the speaker is the current $d from my outer for-loop (Robin, for example). Then, I step down the descendant axis to <char> and filter that in a kind of sloppy way to test if it contains() $d. (I could clean that up by properly tokenizing its string on that apostrophe as you were doing and checking the first part, but I figured contains() might be good enough here.

Then I scripted out the return to make it clear to myself who was speaking and who was being mentioned. Does this do what you need it to do?

amberpeddicord commented 4 years ago

@ebeshero Yes! That looks correct. Thank you so much! Is there anything else I need to fix (aside from formatting the .tsv and outputting it of course) or to do for a network graph, or will this work?

ebeshero commented 4 years ago

@amberpeddicord I think you're good to go...Let's see how it turns out in a network!