Network Analysis Discussion: Bimodal and Directed

ebeshero commented 8 years ago

@RJP43 and @spadafour Rob and I were discussing your ideas for plotting a network analysis from CitySlaveGirls. First of all, just to get your feet wet: 1) You might just want to try this as a prototype for this homework from a bundle of files you know is in good shape (as in, just isolate the files that have the data you want). 2) It seems pretty clear that what you are creating is a directed network, rather than the undirected kind of network that I'm describing in my assignment. That is because (as I understand it) you're wanting to plot: Source-Node: An agent or person (from the "archetypes" via the pronouns/possessive nouns) Shared-Interaction: the active grammatical relationships (from the "segs" markup) Target-Node: the object (whether a material object or even a person or group of people being objectified!)

So this is a directed network because the agent is exerting power (if you will) over an object or defining a relationship to it. The directional flow goes from Agent to Object. When you go to plot your graph, you'll want to select "directed" network (not undirected as I'm leading people in the homework assignment).
And this is a bimodal network, which means that your Source and Target nodes are of two different kinds. Your source and target nodes are going to lose their distinctiveness in Cytoscape when it converts your raw network data into a graph, so you want to output attribute columns (a source-node attribute and a target-node attribute): Those attributes should contain a piece of text that helps you flag when a particular node is an Agent or an Object.
When you output this in Cytoscape you'll be able to refer to those attribute columns in order to control things like the shape or color of your output nodes. For a bimodal network, you might want to make your shapes be different: say, to link circles to triangles or something to help distinguish Agents from Objects.

Hope that makes sense, and I'm eager to see how your network experiment comes out!

RJP43 commented 8 years ago

xquery version "3.0";
declare default element namespace "http://www.tei-c.org/ns/1.0";
let $nellColl := collection('/db/Nelson/CSG_XML')
let $nellFile := $nellColl/*
let $fileDates := $nellFile//teiHeader//fileDesc//title//date/@when/string()
let $fileNums := $nellFile//teiHeader//fileDesc//title/@corresp/substring-after(.,'CT0')
let $phrases := $nellFile//phr
let $agents := $phrases/w[@type="adj"]
let $objects := $phrases/w[@type="noun"]
for $agent in $agents
let $agentValue := 
if ($agent[@ana]) then (distinct-values($agent/@ana/substring-after(.,'#')))
else distinct-values($agent/string())
let $correspObjects := 
if ($agent/parent::phr/w[@type="noun"][@ana]) then ($agent/parent::phr/w[@type="noun"][@ana]/@ana/substring-after(.,'#'))
else $agent/parent::phr/w[@type="noun"]/string()
let $edge := $agent/parent::phr
let $edgeName := "possesses"
for $obts in $correspObjects
return concat($agentValue, "&#x9;", $edge, "&#x9;", $obts, "&#10;")

@spadafour @ebeshero Okay, this is what I have so far. Do I need to add the attribute columns?

RJP43 commented 8 years ago

This is my ouput:

And I get 533 returns so it is grabbing every instance of the <phr> markup. Which makes me think I should take distinct-values somewhere so there are no repeats.

RJP43 commented 8 years ago

can view the output here

RJP43 commented 8 years ago

@spadafour Also while I was working on this I thought of another thing that we might try to use cytoscape for and it seems a bit closer to what is being done on the Decameron project and therefore the assignment page: my idea is to grab all of the placeNames type locRef and type address that are in a single file. My thinking on this is that if we can't get an actual amp in order or maybe as a precursor to a map this visualization would give use the grouping of places mentioned in a single article and likely near one another. Flaws in this is frequently Nelson notes the places that the people she interviews mention so workingGirl addresses or places they mention. A possible way to eliminate some of those flaws and stick close to what I was thinking in grabbing the places near each other based on article grouping would be to grab the placeNames without the said element as an ancestor. Thoughts? Want to give writing that query a try since I have this other visualization query (regarding grammar) pretty much worked out?

RJP43 commented 8 years ago

@spadafour or continue with what I have here and try to get this so that we get the attributes @ebeshero discusses above and distinct-values so we aren't getting the repeated sets of words

spadafour commented 8 years ago

xquery version "3.0";
declare default element namespace "http://www.tei-c.org/ns/1.0";
declare variable $ThisFileContent := 
string-join(
let $nellColl := collection('/db/Nelson/CSG_XML')
let $nellFile := $nellColl/*
let $SiteIndex := doc('/db/Nelson/siteIndex.xml')/*
let $distSIarchs := distinct-values($SiteIndex//nym//re)
let $fileDates := $nellFile//teiHeader//fileDesc//title//date/@when/string()
let $fileNums := $nellFile//teiHeader//fileDesc//title/@corresp/substring-after(.,'CT0')
let $phrases := $nellFile//phr
let $words := $phrases//w
let $posses := $words[@type='poss']

for $poss in $posses
let $parentPhr := $poss/ancestor::phr
let $possValue := 
if ($poss[@ana[substring-after(.,'#') = $SiteIndex//nym/@xml:id/string()]]) then ($SiteIndex//nym[@xml:id/string() = ($poss/@ana/substring-after(.,'#'))]//re/string())
else if ($poss[@ana[substring-after(.,'#') = $SiteIndex//org/@xml:id/string()]]) then ($SiteIndex//org[@xml:id/string() = ($poss/@ana/substring-after(.,'#'))]//orgName/string())
else $poss/string()

let $possAtt := 'poss'

let $correspObject := 
if ($poss/parent::phr/w[@type="noun"][@ana[substring-after(.,'#') = $SiteIndex//nym/@xml:id/string()]]) then ($SiteIndex//nym[@xml:id/string() = ($poss/@ana/substring-after(.,'#'))]//re/string())
else if ($poss/parent::phr/w[@type="noun"][@ana[substring-after(.,'#') = $SiteIndex//org/@xml:id/string()]]) then ($SiteIndex//org[@xml:id/string() = ($poss/@ana/substring-after(.,'#'))]//orgName/string())
else $poss/parent::phr/w[@type="noun"]/string()

let $nounAtt := 'noun'

return concat($possValue, "&#x9;", $possAtt, "&#x9;", $parentPhr, "&#x9;", $correspObject, "&#x9;", $nounAtt), "&#10;");

let $filename := "nelsonGrammarOutput.tsv"
let $doc-db-uri := xmldb:store("/db/Nelson", $filename, $ThisFileContent, "text/plain")
return $doc-db-uri
(:output = http://dxcvm05.psc.edu:8080/exist/rest/db/rParker/nelsonGrammarOutput.tsv:)

Becca and I have been working on this since class and all parts are working individually to do what we want ; however, when we put it all together in the string-join and to output it to a .tsv we are getting an error. @ebeshero Could you help us figure this out? When I come back from class at 4 we would like to work out a corrected cytoscape network, before beginning some more mapping work. Thanks!

You can find this exact query in Becca's exide folder (rParker) and it is titled nelsonGrammar.

RJP43 commented 8 years ago

@spadafour @ebeshero @ghbondar any thoughts why we are getting an issue now?

RJP43 commented 8 years ago

I was thinking one way to maybe fix it would be instead to take the whole thing from the nouns which sometimes fall as multiples for each poss ... so the single thing in the many would be noun in nouns and then we would have to find the associated poss for each. I am not sure that is what this error actually means though. I would have thought that even the way we are doing it now it would just cause a repeated poss when multiple nouns appear, but the error is a cardinality issue so maybe it is this. @ebeshero @spadafour @ghbondar @nlottig94 thoughts?

RJP43 commented 8 years ago

@ebeshero this is the issue I was trying to explain after capstone today.

ebeshero commented 8 years ago

@spadafour and @RJP43 Sorry for the delay! I'm on it now...

ebeshero commented 8 years ago

Any updates since you were working on this??? Sorry--I've been buried under Lit Capstone drafts since yesterday!

RJP43 commented 8 years ago

No in the same situation I didn't try what I was thinking in my last comment because I'm buried in my own drafting and I wasn't gonna get caught up on it if it looks like that isn't actually what is causing the error. Thanks for having a look!

ebeshero commented 8 years ago

@RJP43 @spadafour Well, I corrected something bizarre and not right in the $possValue if () then else statement--really just simplified that. That resolved one set of errors, but a cardinality error remained.

Now, I've isolated the problem to this part, and I can already see the problem: You're not writing if() then else statements correctly! Here's what's happening: You're saying:

if (condition = this) then (condition = this)

instead of

if (condition = this) then output something

Can you spot the error in the code yourself now? (it's oddly similar to if statement issues in JavaScript, except that we use a single = sign in JavaScript to assign a new value to a variable, whereas in XQuery we use the single = to test for comparison, and assign a value by designating an XPath (and by using :=). The thinking process is the same, but the syntax is different.

let $correspObject := 
if ($poss/parent::phr/w[@type="noun"][@ana[substring-after(.,'#') = $SiteIndex//nym/@xml:id/string()]]) then ($SiteIndex//nym[@xml:id/string() = ($poss/@ana/substring-after(.,'#'))]//re/string())
else if ($poss/parent::phr/w[@type="noun"][@ana[substring-after(.,'#') = $SiteIndex//org/@xml:id/string()]]) then ($SiteIndex//org[@xml:id/string() = ($poss/@ana/substring-after(.,'#'))]//orgName/string())
else $poss/parent::phr/w[@type="noun"]/string()

ebeshero commented 8 years ago

So, here's how I corrected $possValue:

let $possValue := 
if ($poss[@ana[substring-after(.,'#') = $SiteIndex//nym/@xml:id/string()]]) 
then $poss//re/string()

else if ($poss[@ana[substring-after(.,'#') = $SiteIndex//org/@xml:id/string()]]) 
then $poss//orgName/string()

else $poss/string()

Now, I think you can do something similar to correct $correspObject. I'll leave that to you, and ping me if you figure it out, or if you're still stuck!

RJP43 commented 8 years ago

@ebeshero that correction for $possValue takes away the pulling in of the names from the site index as what appears in the list, which is what we had and want so that we aren't using the jammed up text of the @ana/@ref/xml:id (like workingGirl) or the literal text inside of the <w> element (like her) and instead get our consistent site index <re> text (City Slave Girl) or for organizations the <org> text from the site index so we don't get the @ana/@ref/xml:id with an abbreviation of company names and instead get the <orgName> from the site index that has their full names.

So to clarify further -- Your solution gives us this:

tailor's
day's
their
Hood
its
his
your
his
his
working-man's

and only 53 returns, which is far from the actual number of poss words

Our code from above works and gets us all 694 poss words from all the files getting us a list like this:

Employer
Employer
Employer
Employer
Nell Nelson
City Slave Girl
City Slave Girl
City Slave Girl
City Slave Girl
City Slave Girl

If you run the code we had (from above) as individual pieces you can see we are correctly getting each of the pieces and we ran queries prior to make sure we are catching all of each type of word know that there are 691 phrases with 694 poss words and ~722 associated nouns.

The issue we are getting is when we put them together and I am still thinking it must be a cardinality issue. I am going to try and reverse what we find first to see if it will work.

RJP43 commented 8 years ago

No that didn't work either now I am getting the 722 nouns for $nounValue and only 93 possessive words for $correspPoss

xquery version "3.0";
declare default element namespace "http://www.tei-c.org/ns/1.0";

let $nellColl := collection('/db/Nelson/CSG_XML')
let $nellFile := $nellColl/*
let $SiteIndex := doc('/db/Nelson/siteIndex.xml')/*
let $distSIarchs := distinct-values($SiteIndex//nym//re)
let $fileDates := $nellFile//teiHeader//fileDesc//title//date/@when/string()
let $fileNums := $nellFile//teiHeader//fileDesc//title/@corresp/substring-after(.,'CT0')
let $phrases := $nellFile//phr
let $words := $phrases//w
let $nouns := $words[@type='noun']

for $noun in $nouns
let $parentPhr := $noun/ancestor::phr

let $nounValue := 
if ($noun[@ana[substring-after(.,'#') = $SiteIndex//nym/@xml:id/string()]]) then ($SiteIndex//nym[@xml:id/string() = ($noun/@ana/substring-after(.,'#'))]//re/string())
else if ($noun[@ana[substring-after(.,'#') = $SiteIndex//org/@xml:id/string()]]) then ($SiteIndex//org[@xml:id/string() = ($noun/@ana/substring-after(.,'#'))]//orgName/string())
else $noun/string()

let $correspPoss := 
if ($parentPhr//w[@type="poss"][@ana[substring-after(.,'#') = $SiteIndex//nym/@xml:id/string()]]) then ($SiteIndex//nym[@xml:id/string() = ($noun/@ana/substring-after(.,'#'))]//re/string())
else if ($parentPhr//w[@type="poss"][@ana[substring-after(.,'#') = $SiteIndex//org/@xml:id/string()]]) then ($SiteIndex//org[@xml:id/string() = ($noun/@ana/substring-after(.,'#'))]//orgName/string())
else $parentPhr//w[@type="poss"]/string()

let $nounAtt := 'noun'
let $possAtt := 'poss'

return $correspPoss

I think the issue is that sometimes there are more than one poss words in a phrase for a single noun and sometimes there are more than one nouns in a phrase for a single poss word, and I am not sure that we are grabbing those instances correctly.

What we would want is that when one of those things are repeated each of the words come out with the other word(s) it is paired with in the <phr> and instead of getting the literal text of the <w> element or getting the xml:id we want the standard representation from our siteindex. If you run the code that is in the nelsonGrammar or the nelsonGrammarNounsFirst queries in the rParker folder on exide you can see that for both the first thing we grab is working correctly (whether it be $possValue or `$nounValue). It is correctly grabbing that Site Index standard in our results and giving the expected number based off of our preliminary counts of each type of word that was done outside of the for loop.

RJP43 commented 8 years ago

@ebeshero @spadafour thoughts to fix?

ebeshero commented 8 years ago

@RJP43 @spadafour OK--I see what I mistook before as a second test for equivalence in your then statements was actually an = in your predicates! Sorry about that! I'll go take a look again.

ebeshero commented 8 years ago

@RJP43 @spadafour if you have 722 associated nouns, you need to have 722 lines of output. I think you fix this with a new "for loop" to generate a line for each associated noun.

RJP43 commented 8 years ago

aha! yes!

RJP43 commented 8 years ago

Thank you! @ebeshero @spadafour here is the link http://dxcvm05.psc.edu:8080/exist/rest/db/Nelson/nelsonGrammarOutput.tsv and its working

ebeshero commented 8 years ago

@RJP43 Huzzah!! Success. I'm glad you figured it out--and I'm sorry I had trouble reading those complicated if then statements earlier! ;-)

RJP43 commented 8 years ago

It's understandable they took us a while to figure out too.

ebeshero commented 8 years ago

I'm skimming through your TSV now, and I think my favorite lines are:

City Slave Girl poss    her bit of soap and grimy cotton towel  soap    noun
City Slave Girl poss    her bit of soap and grimy cotton towel  towel   noun

I think that could be a T-shirt design, too...too bad we already made a batch of shirts! ;-) Maybe they can made out of "grimy cotton"!

On an unrelated note, check Courseweb for your Theory Exam score, and make sure I give that back to you (and Amanda and Megan) tomorrow!

RJP43 commented 8 years ago

@spadafour so I ordered this list by our $possValue to group all of the like things into sections and what I am noticing is we should probably go back through some of the ones that are pooping up still as pronouns and verify that they aren't just missing @ana for example we might want to check out were there all fall:

our poss    our own city    city    noun
our poss    our sweetest smiles smiles  noun
son's   poss    son'soppression oppression  noun
tailor's    poss    tailor'schalk   chalk   noun
their   poss    theirmothers'fault  fault   noun
their   poss    theirsills  sills   noun
their   poss    their heavy shears  shears  noun
their   poss    theircomplexion complexion  noun
their   poss    theirodors  odors   noun
their   poss    theirhearts hearts  noun
their   poss    theirlives  lives   noun
their   poss    theirorigin origin  noun
their   poss    theirnature nature  noun
their   poss    theirsurroundings   surroundings    noun
their   poss    theirassociates associates  noun
their   poss    theirbenevolence    benevolence noun
their   poss    their shabby clothes    clothes noun
their   poss    theirwrongs wrongs  noun
their   poss    their own hands hands   noun
their   poss    theireyes   eyes    noun
toilet's    poss    hertoilet's greasy task task    noun
week's  poss    week'spay   pay noun
woman's poss    woman'sheart    heart   noun
working-man's   poss    working-man'shome   home    noun
your    poss    yourarticles    articles    noun
your    poss    yourfastidiousness  fastidiousness  noun

We can discuss this in class.

RJP43 commented 8 years ago

@ebeshero here is one of my favorite lines:

toilet's poss hertoilet's greasy task task noun

ebeshero commented 8 years ago

Ughhhh, gross! Lol! :+1:

RJP43 / CitySlaveGirls

Network Analysis Discussion: Bimodal and Directed #54