Open ebeshero opened 8 years ago
xquery version "3.0";
declare default element namespace "http://www.tei-c.org/ns/1.0";
let $nellColl := collection('/db/Nelson/CSG_XML')
let $nellFile := $nellColl/*
let $fileDates := $nellFile//teiHeader//fileDesc//title//date/@when/string()
let $fileNums := $nellFile//teiHeader//fileDesc//title/@corresp/substring-after(.,'CT0')
let $phrases := $nellFile//phr
let $agents := $phrases/w[@type="adj"]
let $objects := $phrases/w[@type="noun"]
for $agent in $agents
let $agentValue :=
if ($agent[@ana]) then (distinct-values($agent/@ana/substring-after(.,'#')))
else distinct-values($agent/string())
let $correspObjects :=
if ($agent/parent::phr/w[@type="noun"][@ana]) then ($agent/parent::phr/w[@type="noun"][@ana]/@ana/substring-after(.,'#'))
else $agent/parent::phr/w[@type="noun"]/string()
let $edge := $agent/parent::phr
let $edgeName := "possesses"
for $obts in $correspObjects
return concat($agentValue, "	", $edge, "	", $obts, " ")
@spadafour @ebeshero Okay, this is what I have so far. Do I need to add the attribute columns?
This is my ouput:
And I get 533 returns so it is grabbing every instance of the <phr>
markup. Which makes me think I should take distinct-values somewhere so there are no repeats.
@spadafour Also while I was working on this I thought of another thing that we might try to use cytoscape for and it seems a bit closer to what is being done on the Decameron project and therefore the assignment page: my idea is to grab all of the placeNames type locRef and type address that are in a single file. My thinking on this is that if we can't get an actual amp in order or maybe as a precursor to a map this visualization would give use the grouping of places mentioned in a single article and likely near one another. Flaws in this is frequently Nelson notes the places that the people she interviews mention so workingGirl addresses or places they mention. A possible way to eliminate some of those flaws and stick close to what I was thinking in grabbing the places near each other based on article grouping would be to grab the placeNames without the said element as an ancestor. Thoughts? Want to give writing that query a try since I have this other visualization query (regarding grammar) pretty much worked out?
@spadafour or continue with what I have here and try to get this so that we get the attributes @ebeshero discusses above and distinct-values so we aren't getting the repeated sets of words
xquery version "3.0";
declare default element namespace "http://www.tei-c.org/ns/1.0";
declare variable $ThisFileContent :=
string-join(
let $nellColl := collection('/db/Nelson/CSG_XML')
let $nellFile := $nellColl/*
let $SiteIndex := doc('/db/Nelson/siteIndex.xml')/*
let $distSIarchs := distinct-values($SiteIndex//nym//re)
let $fileDates := $nellFile//teiHeader//fileDesc//title//date/@when/string()
let $fileNums := $nellFile//teiHeader//fileDesc//title/@corresp/substring-after(.,'CT0')
let $phrases := $nellFile//phr
let $words := $phrases//w
let $posses := $words[@type='poss']
for $poss in $posses
let $parentPhr := $poss/ancestor::phr
let $possValue :=
if ($poss[@ana[substring-after(.,'#') = $SiteIndex//nym/@xml:id/string()]]) then ($SiteIndex//nym[@xml:id/string() = ($poss/@ana/substring-after(.,'#'))]//re/string())
else if ($poss[@ana[substring-after(.,'#') = $SiteIndex//org/@xml:id/string()]]) then ($SiteIndex//org[@xml:id/string() = ($poss/@ana/substring-after(.,'#'))]//orgName/string())
else $poss/string()
let $possAtt := 'poss'
let $correspObject :=
if ($poss/parent::phr/w[@type="noun"][@ana[substring-after(.,'#') = $SiteIndex//nym/@xml:id/string()]]) then ($SiteIndex//nym[@xml:id/string() = ($poss/@ana/substring-after(.,'#'))]//re/string())
else if ($poss/parent::phr/w[@type="noun"][@ana[substring-after(.,'#') = $SiteIndex//org/@xml:id/string()]]) then ($SiteIndex//org[@xml:id/string() = ($poss/@ana/substring-after(.,'#'))]//orgName/string())
else $poss/parent::phr/w[@type="noun"]/string()
let $nounAtt := 'noun'
return concat($possValue, "	", $possAtt, "	", $parentPhr, "	", $correspObject, "	", $nounAtt), " ");
let $filename := "nelsonGrammarOutput.tsv"
let $doc-db-uri := xmldb:store("/db/Nelson", $filename, $ThisFileContent, "text/plain")
return $doc-db-uri
(:output = http://dxcvm05.psc.edu:8080/exist/rest/db/rParker/nelsonGrammarOutput.tsv:)
Becca and I have been working on this since class and all parts are working individually to do what we want ; however, when we put it all together in the string-join and to output it to a .tsv we are getting an error. @ebeshero Could you help us figure this out? When I come back from class at 4 we would like to work out a corrected cytoscape network, before beginning some more mapping work. Thanks!
You can find this exact query in Becca's exide folder (rParker) and it is titled nelsonGrammar.
@spadafour @ebeshero @ghbondar any thoughts why we are getting an issue now?
I was thinking one way to maybe fix it would be instead to take the whole thing from the nouns which sometimes fall as multiples for each poss ... so the single thing in the many would be noun in nouns and then we would have to find the associated poss for each. I am not sure that is what this error actually means though. I would have thought that even the way we are doing it now it would just cause a repeated poss when multiple nouns appear, but the error is a cardinality issue so maybe it is this. @ebeshero @spadafour @ghbondar @nlottig94 thoughts?
@ebeshero this is the issue I was trying to explain after capstone today.
@spadafour and @RJP43 Sorry for the delay! I'm on it now...
Any updates since you were working on this??? Sorry--I've been buried under Lit Capstone drafts since yesterday!
No in the same situation I didn't try what I was thinking in my last comment because I'm buried in my own drafting and I wasn't gonna get caught up on it if it looks like that isn't actually what is causing the error. Thanks for having a look!
@RJP43 @spadafour Well, I corrected something bizarre and not right in the $possValue
if () then else statement--really just simplified that. That resolved one set of errors, but a cardinality error remained.
Now, I've isolated the problem to this part, and I can already see the problem: You're not writing if() then else
statements correctly! Here's what's happening: You're saying:
if (condition = this) then (condition = this)
instead of
if (condition = this) then output something
Can you spot the error in the code yourself now? (it's oddly similar to if statement issues in JavaScript, except that we use a single =
sign in JavaScript to assign a new value to a variable, whereas in XQuery we use the single =
to test for comparison, and assign a value by designating an XPath (and by using :=
). The thinking process is the same, but the syntax is different.
let $correspObject :=
if ($poss/parent::phr/w[@type="noun"][@ana[substring-after(.,'#') = $SiteIndex//nym/@xml:id/string()]]) then ($SiteIndex//nym[@xml:id/string() = ($poss/@ana/substring-after(.,'#'))]//re/string())
else if ($poss/parent::phr/w[@type="noun"][@ana[substring-after(.,'#') = $SiteIndex//org/@xml:id/string()]]) then ($SiteIndex//org[@xml:id/string() = ($poss/@ana/substring-after(.,'#'))]//orgName/string())
else $poss/parent::phr/w[@type="noun"]/string()
So, here's how I corrected $possValue
:
let $possValue :=
if ($poss[@ana[substring-after(.,'#') = $SiteIndex//nym/@xml:id/string()]])
then $poss//re/string()
else if ($poss[@ana[substring-after(.,'#') = $SiteIndex//org/@xml:id/string()]])
then $poss//orgName/string()
else $poss/string()
Now, I think you can do something similar to correct $correspObject
. I'll leave that to you, and ping me if you figure it out, or if you're still stuck!
@ebeshero that correction for $possValue
takes away the pulling in of the names from the site index as what appears in the list, which is what we had and want so that we aren't using the jammed up text of the @ana
/@ref
/xml:id (like workingGirl
) or the literal text inside of the <w>
element (like her
) and instead get our consistent site index <re>
text (City Slave Girl
) or for organizations the <org>
text from the site index so we don't get the @ana
/@ref
/xml:id with an abbreviation of company names and instead get the <orgName>
from the site index that has their full names.
So to clarify further -- Your solution gives us this:
tailor's
day's
their
Hood
its
his
your
his
his
working-man's
and only 53 returns, which is far from the actual number of poss words
Our code from above works and gets us all 694 poss words from all the files getting us a list like this:
Employer
Employer
Employer
Employer
Nell Nelson
City Slave Girl
City Slave Girl
City Slave Girl
City Slave Girl
City Slave Girl
If you run the code we had (from above) as individual pieces you can see we are correctly getting each of the pieces and we ran queries prior to make sure we are catching all of each type of word know that there are 691 phrases with 694 poss words and ~722 associated nouns.
The issue we are getting is when we put them together and I am still thinking it must be a cardinality issue. I am going to try and reverse what we find first to see if it will work.
No that didn't work either now I am getting the 722 nouns for $nounValue
and only 93 possessive words for $correspPoss
xquery version "3.0";
declare default element namespace "http://www.tei-c.org/ns/1.0";
let $nellColl := collection('/db/Nelson/CSG_XML')
let $nellFile := $nellColl/*
let $SiteIndex := doc('/db/Nelson/siteIndex.xml')/*
let $distSIarchs := distinct-values($SiteIndex//nym//re)
let $fileDates := $nellFile//teiHeader//fileDesc//title//date/@when/string()
let $fileNums := $nellFile//teiHeader//fileDesc//title/@corresp/substring-after(.,'CT0')
let $phrases := $nellFile//phr
let $words := $phrases//w
let $nouns := $words[@type='noun']
for $noun in $nouns
let $parentPhr := $noun/ancestor::phr
let $nounValue :=
if ($noun[@ana[substring-after(.,'#') = $SiteIndex//nym/@xml:id/string()]]) then ($SiteIndex//nym[@xml:id/string() = ($noun/@ana/substring-after(.,'#'))]//re/string())
else if ($noun[@ana[substring-after(.,'#') = $SiteIndex//org/@xml:id/string()]]) then ($SiteIndex//org[@xml:id/string() = ($noun/@ana/substring-after(.,'#'))]//orgName/string())
else $noun/string()
let $correspPoss :=
if ($parentPhr//w[@type="poss"][@ana[substring-after(.,'#') = $SiteIndex//nym/@xml:id/string()]]) then ($SiteIndex//nym[@xml:id/string() = ($noun/@ana/substring-after(.,'#'))]//re/string())
else if ($parentPhr//w[@type="poss"][@ana[substring-after(.,'#') = $SiteIndex//org/@xml:id/string()]]) then ($SiteIndex//org[@xml:id/string() = ($noun/@ana/substring-after(.,'#'))]//orgName/string())
else $parentPhr//w[@type="poss"]/string()
let $nounAtt := 'noun'
let $possAtt := 'poss'
return $correspPoss
I think the issue is that sometimes there are more than one poss words in a phrase for a single noun and sometimes there are more than one nouns in a phrase for a single poss word, and I am not sure that we are grabbing those instances correctly.
What we would want is that when one of those things are repeated each of the words come out with the other word(s) it is paired with in the <phr>
and instead of getting the literal text of the <w>
element or getting the xml:id we want the standard representation from our siteindex. If you run the code that is in the nelsonGrammar or the nelsonGrammarNounsFirst queries in the rParker folder on exide you can see that for both the first thing we grab is working correctly (whether it be $possValue
or `$nounValue). It is correctly grabbing that Site Index standard in our results and giving the expected number based off of our preliminary counts of each type of word that was done outside of the for loop.
@ebeshero @spadafour thoughts to fix?
@RJP43 @spadafour OK--I see what I mistook before as a second test for equivalence in your then statements was actually an =
in your predicates! Sorry about that! I'll go take a look again.
@RJP43 @spadafour if you have 722 associated nouns, you need to have 722 lines of output. I think you fix this with a new "for loop" to generate a line for each associated noun.
aha! yes!
Thank you! @ebeshero @spadafour here is the link http://dxcvm05.psc.edu:8080/exist/rest/db/Nelson/nelsonGrammarOutput.tsv and its working
@RJP43 Huzzah!! Success. I'm glad you figured it out--and I'm sorry I had trouble reading those complicated if then statements earlier! ;-)
It's understandable they took us a while to figure out too.
I'm skimming through your TSV now, and I think my favorite lines are:
City Slave Girl poss her bit of soap and grimy cotton towel soap noun
City Slave Girl poss her bit of soap and grimy cotton towel towel noun
I think that could be a T-shirt design, too...too bad we already made a batch of shirts! ;-) Maybe they can made out of "grimy cotton"!
On an unrelated note, check Courseweb for your Theory Exam score, and make sure I give that back to you (and Amanda and Megan) tomorrow!
@spadafour so I ordered this list by our $possValue to group all of the like things into sections and what I am noticing is we should probably go back through some of the ones that are pooping up still as pronouns and verify that they aren't just missing @ana
for example we might want to check out were there all fall:
our poss our own city city noun
our poss our sweetest smiles smiles noun
son's poss son'soppression oppression noun
tailor's poss tailor'schalk chalk noun
their poss theirmothers'fault fault noun
their poss theirsills sills noun
their poss their heavy shears shears noun
their poss theircomplexion complexion noun
their poss theirodors odors noun
their poss theirhearts hearts noun
their poss theirlives lives noun
their poss theirorigin origin noun
their poss theirnature nature noun
their poss theirsurroundings surroundings noun
their poss theirassociates associates noun
their poss theirbenevolence benevolence noun
their poss their shabby clothes clothes noun
their poss theirwrongs wrongs noun
their poss their own hands hands noun
their poss theireyes eyes noun
toilet's poss hertoilet's greasy task task noun
week's poss week'spay pay noun
woman's poss woman'sheart heart noun
working-man's poss working-man'shome home noun
your poss yourarticles articles noun
your poss yourfastidiousness fastidiousness noun
We can discuss this in class.
@ebeshero here is one of my favorite lines:
toilet's poss hertoilet's greasy task task noun
Ughhhh, gross! Lol! :+1:
@RJP43 and @spadafour Rob and I were discussing your ideas for plotting a network analysis from CitySlaveGirls. First of all, just to get your feet wet: 1) You might just want to try this as a prototype for this homework from a bundle of files you know is in good shape (as in, just isolate the files that have the data you want). 2) It seems pretty clear that what you are creating is a directed network, rather than the undirected kind of network that I'm describing in my assignment. That is because (as I understand it) you're wanting to plot: Source-Node: An agent or person (from the "archetypes" via the pronouns/possessive nouns) Shared-Interaction: the active grammatical relationships (from the "segs" markup) Target-Node: the object (whether a material object or even a person or group of people being objectified!)
Hope that makes sense, and I'm eager to see how your network experiment comes out!