DigitalMitford / DM_processing

a repo for working on processing for the Digital Mitford project, including schemas, XSLT, XQuery, and other production and analysis efforts
http://digitalmitford.org
GNU Affero General Public License v3.0
8 stars 3 forks source link

XQuery to SVG #8

Open ebeshero opened 8 years ago

ebeshero commented 8 years ago

@ezimmer @mollyodonnell : We've got loads of new and better letters now, thanks to coordinated efforts of Molly and Lisa and the rest of the active Mitford team! In the morning I'll "cull" or "harvest" or otherwise collect the good files and load them into our eXist database. And we'll re-run the XQuery we wrote, Erica, that makes a simple chart, but we'll move on to plot some SVG from it.

By the way (or maybe not), in the coordinated effort, Lisa and I discovered that Miss James might be the elder/eldest sister of what could be a trio of sisters: We learn that Miss James lost a younger sister named Emily, and that Susan James could be a younger sister of Miss James. It's not entirely certain but seems likely...and we'll be able to see more as we explore with our new Annotation Enhancement Tool!

ezimmer commented 8 years ago

That's amazingly exciting! (How oddly Bronte, too: guessing you've seen that Charlotte and MRM died in the same year? Not going to make anything of Miss James' "losing a younger sister named Emily"... :) )

Very much looking forward to the next few weeks. @ebeshero @mollyodonnell

ezimmer commented 8 years ago

(Just to be clear, that's not a serious suggestion! :) A remarkably odd parallel, though.)

ebeshero commented 8 years ago

I know--The James sisters were apparently working as governesses or seeking governess work, so the Bronte parallels abound! @ezimmer

ezimmer commented 8 years ago

That's fascinating (and might even suggest a basis for further research--who knows?).

Would there be a chance some time Monday might work well for talking SVG briefly? (Not sure if the holiday is better or worse for you--would love to touch base some time this upcoming week!) @ebeshero

mollyodonnell commented 8 years ago

@ebeshero @ezimmer This is awesome! So cool that the team pulled together. Lisa is so fast, and Elisa was able to untangle my crazy code issues in a flash. Elisa, I pinged Lisa earlier this week because one letter I was going to update the header for wasn't in the spreadsheet yet. I'll ping you so you know which I mean. The other thing is I still have two of Lisa's letters to proof/update headers. Was going to try to wrap those today, but will ping you on them now, uploading my latest versions, in case it's too late. I know yesterday was the deadline, but I'm in New Orleans for a conference and have had a mtn of grading...excuses, excuses, ah. More soon.

ebeshero commented 8 years ago

@mollyodonnell @ezimmer Molly: No worries, but do ping me from Box when you upload repaired files, because I've uploaded the current batch of letters files (and literary files) that were well-formed into our eXist database. Erica, we have new collections now, and we might be updating those now and again over the next couple of weeks, but that's okay. We need to work on that XQuery we started and start drawing some SVG with it. And I need to get that started. I spent what time I had today on late edits and prepping the database. I'm going to break and work on grading and other stuff for a bit and then come back to it shortly...I'll ping again soon!

mollyodonnell commented 8 years ago

@ebeshero will do.

ebeshero commented 8 years ago

@ezimmer Erica--nearly missed your note, but just saw it! No, alas--Monday is crazy. Our break isn't until the following week. But I'm going to try to mock up something with XQuery shortly and maybe we can discuss it here.

athenerica2003 commented 8 years ago

No worries here--thank you both so much for all that you have done! Will get going with what we have, too--anything we do will become more nuanced and grounded from this point, so that's great. Here's looking forward!

@ebeshero @mollyodonnell

On Sat, Oct 10, 2015 at 6:05 PM, Elisa Beshero-Bondar < notifications@github.com> wrote:

@ezimmer https://github.com/ezimmer Erica--nearly missed your note, but just saw it! No, alas--Monday is crazy. Our break isn't until the following week. But I'm going to try to mock up something with XQuery shortly and maybe we can discuss it here.

— Reply to this email directly or view it on GitHub https://github.com/ebeshero/mitford/issues/8#issuecomment-147129206.

ebeshero commented 8 years ago

@ezimmer @mollyodonnell Making some progress here! I'm refining our tester XQuery: It was a little trickier than I expected to get the top 3 (or more) in a given category, but we've got it now! I want to talk to you both about outputting a good readable plot of our output: It may be easier and clearer for comparison to do this as a treemap rather than radiating wedges around a dial--though we can tinker with this. (Sometimes radial plots can introduce weird distortions, and for the moment I just want something simple and straightforward to calculate.) I'm first going to try a treemap, so we can see what it looks like. Google treemaps, and click the image results

I'll explain more and send a mockup this weekend!

mollyodonnell commented 8 years ago

@ebeshero @ezimmer My apologies for the delay on the two remaining letters. I might still take care of them this week and ping you, though it probably won't be worth redoing what you've done for so few that are already correctly tagged, though lack the complete header data. Again, apologies.

Wedges sound great to me (trivial pursuit superfan). Looking forward to more, too.

mollyodonnell commented 8 years ago

@ebeshero pinged in box last night for remaining two. Just updating here for GitHub record.

mollyodonnell commented 8 years ago

@ebeshero and @ezimmer updating my academia.edu & added abstract and links to this talk along with my others, tagging you. If you have any issues with that, please let me know and I'll take it down / amend. I just linked to the online program.

ezimmer commented 8 years ago

@mollyodonnell That's great! Glad you did.

On Friday, October 16, 2015, mollyodonnell notifications@github.com wrote:

@ebeshero https://github.com/ebeshero and @ezimmer https://github.com/ezimmer updating my academia.edu & added abstract and links to this talk along with my others, tagging you. If you have any issues with that, please let me know and I'll take it down / amend. I just linked to the online program.

— Reply to this email directly or view it on GitHub https://github.com/ebeshero/mitford/issues/8#issuecomment-148789931.

athenerica2003 commented 8 years ago

And @ebondar, that sounds like a terrific plan. Will check in again as soon as at a real computer!

On Friday, October 16, 2015, Erica Zimmer notifications@github.com wrote:

@mollyodonnell That's great! Glad you did.

On Friday, October 16, 2015, mollyodonnell <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

@ebeshero https://github.com/ebeshero and @ezimmer https://github.com/ezimmer updating my academia.edu & added abstract and links to this talk along with my others, tagging you. If you have any issues with that, please let me know and I'll take it down / amend. I just linked to the online program.

— Reply to this email directly or view it on GitHub https://github.com/ebeshero/mitford/issues/8#issuecomment-148789931.

— Reply to this email directly or view it on GitHub https://github.com/ebeshero/mitford/issues/8#issuecomment-148849681.

ezimmer commented 8 years ago

(Sorry--meant to reply to @ebeshero.)

Huge cheers to the top three in each group! (Thank you also for all the work you have been doing.)

Since radiating wedges around a dial introduce more distortion than necessary, am now searching for examples--and code--of possibilities that aren't circular.

From our initial conversation, it seemed the two most useful features of a visualization would be the following:

1) clusters of item types (that is, the top 3 persNames together, the top 3 placeNames together, etc.)

2) length of distance from the central item as conveying the secondary item's relative co-occurrence with the central item.

(In other words, the most frequently co-occurring item within a particular category could be shortest/closest, with the second most frequent co-occurrence the next shortest/closest, and so on.)

More soon.

On Fri, Oct 16, 2015 at 8:12 PM, Erica Zimmer notifications@github.com wrote:

And @ebondar, that sounds like a terrific plan. Will check in again as soon as at a real computer!

On Friday, October 16, 2015, Erica Zimmer notifications@github.com wrote:

@mollyodonnell That's great! Glad you did.

On Friday, October 16, 2015, mollyodonnell <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

@ebeshero https://github.com/ebeshero and @ezimmer https://github.com/ezimmer updating my academia.edu & added abstract and links to this talk along with my others, tagging you. If you have any issues with that, please let me know and I'll take it down / amend. I just linked to the online program.

— Reply to this email directly or view it on GitHub https://github.com/ebeshero/mitford/issues/8#issuecomment-148789931.

— Reply to this email directly or view it on GitHub https://github.com/ebeshero/mitford/issues/8#issuecomment-148849681.

— Reply to this email directly or view it on GitHub https://github.com/ebeshero/mitford/issues/8#issuecomment-148868896.

ezimmer commented 8 years ago

Hi @ebeshero--whenever you have a second, could I ask if the "Mitford Co-Citation Counts" query saved in eXist/eXide is the right one to run for the current counts?

Am asking only to work from a clearer sense of the proportions--that is, how big the differences between categories/number of occurrences are.

Thank you!

On Sat, Oct 17, 2015 at 8:34 PM, Mary Zimmer ezimmer@bu.edu wrote:

(Sorry--meant to reply to @ebeshero.)

Huge cheers to the top three in each group! (Thank you also for all the work you have been doing.)

Since radiating wedges around a dial introduce more distortion than necessary, am now searching for examples--and code--of possibilities that aren't circular.

From our initial conversation, it seemed the two most useful features of a visualization would be the following:

1) clusters of item types (that is, the top 3 persNames together, the top 3 placeNames together, etc.)

2) length of distance from the central item as conveying the secondary item's relative co-occurrence with the central item.

(In other words, the most frequently co-occurring item within a particular category could be shortest/closest, with the second most frequent co-occurrence the next shortest/closest, and so on.)

More soon.

On Fri, Oct 16, 2015 at 8:12 PM, Erica Zimmer notifications@github.com wrote:

And @ebondar, that sounds like a terrific plan. Will check in again as soon as at a real computer!

On Friday, October 16, 2015, Erica Zimmer notifications@github.com wrote:

@mollyodonnell That's great! Glad you did.

On Friday, October 16, 2015, mollyodonnell <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

@ebeshero https://github.com/ebeshero and @ezimmer https://github.com/ezimmer updating my academia.edu & added abstract and links to this talk along with my others, tagging you. If you have any issues with that, please let me know and I'll take it down / amend. I just linked to the online program.

— Reply to this email directly or view it on GitHub <https://github.com/ebeshero/mitford/issues/8#issuecomment-148789931 .

— Reply to this email directly or view it on GitHub https://github.com/ebeshero/mitford/issues/8#issuecomment-148849681.

— Reply to this email directly or view it on GitHub https://github.com/ebeshero/mitford/issues/8#issuecomment-148868896.

ezimmer commented 8 years ago

(Update: just ran the aforementioned query--definitely not it! :) )

Would be curious, whenever you have time.

Thanks again!

@ebeshero

On Sun, Oct 18, 2015 at 1:50 PM, Mary Zimmer ezimmer@bu.edu wrote:

Hi @ebeshero--whenever you have a second, could I ask if the "Mitford Co-Citation Counts" query saved in eXist/eXide is the right one to run for the current counts?

Am asking only to work from a clearer sense of the proportions--that is, how big the differences between categories/number of occurrences are.

Thank you!

On Sat, Oct 17, 2015 at 8:34 PM, Mary Zimmer ezimmer@bu.edu wrote:

(Sorry--meant to reply to @ebeshero.)

Huge cheers to the top three in each group! (Thank you also for all the work you have been doing.)

Since radiating wedges around a dial introduce more distortion than necessary, am now searching for examples--and code--of possibilities that aren't circular.

From our initial conversation, it seemed the two most useful features of a visualization would be the following:

1) clusters of item types (that is, the top 3 persNames together, the top 3 placeNames together, etc.)

2) length of distance from the central item as conveying the secondary item's relative co-occurrence with the central item.

(In other words, the most frequently co-occurring item within a particular category could be shortest/closest, with the second most frequent co-occurrence the next shortest/closest, and so on.)

More soon.

On Fri, Oct 16, 2015 at 8:12 PM, Erica Zimmer notifications@github.com wrote:

And @ebondar, that sounds like a terrific plan. Will check in again as soon as at a real computer!

On Friday, October 16, 2015, Erica Zimmer notifications@github.com wrote:

@mollyodonnell That's great! Glad you did.

On Friday, October 16, 2015, mollyodonnell <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

@ebeshero https://github.com/ebeshero and @ezimmer https://github.com/ezimmer updating my academia.edu & added abstract and links to this talk along with my others, tagging you. If you have any issues with that, please let me know and I'll take it down / amend. I just linked to the online program.

— Reply to this email directly or view it on GitHub <https://github.com/ebeshero/mitford/issues/8#issuecomment-148789931 .

— Reply to this email directly or view it on GitHub https://github.com/ebeshero/mitford/issues/8#issuecomment-148849681.

— Reply to this email directly or view it on GitHub https://github.com/ebeshero/mitford/issues/8#issuecomment-148868896.

ebeshero commented 8 years ago

@ezimmer: Its TesterMissJames-coRef, and it's in my queries folder. It's only outputting the top three people, but I had to rewrite our code and start over to simplify things.

ebeshero commented 8 years ago

@ezimmer Basically I want to build on that code--it's saving to an output file (like our last one was). But now that it's getting the top three of "X", I want to keep building on it, and also output SVG shapes from it. I had some help on Thursday--sat down with David Birnbaum a while and we worked on debugging my first attempt. He also advised not going with the radial wedges--and showed me the tree maps. Those will be easier to plot anyway.

I want to work on this a little later today and tomorrow (which is Fall Break for me)--For right now I'm trying to clear some time-sensitive stuff for class and clear the decks...more soon.

ezimmer commented 8 years ago

No worries--thank you so much! I was only asking to see if there would be things I could do to help.

(The db won't let me run the query, I think because the results are already saved in a folder for which permissions may be set to you. No worries there either--I don't want to mess anything up!)

Will just keep working around, and will look forward to touching base whenever you have time.

Thank you again, @ebeshero!

On Sun, Oct 18, 2015 at 1:57 PM, Elisa Beshero-Bondar < notifications@github.com> wrote:

@ezimmer https://github.com/ezimmer Basically I want to build on that code--it's saving to an output file (like our last one was). But now that it's getting the top three of "X", I want to keep building on it, and also outputting shapes from it. I had some help--sat down with David Birnbaum a while and we worked on debugging my first attempt. He also advised not going with the radial wedges--and showed me the tree maps. Those will be easier to plot anyway.

I want to work on this a little later today and tomorrow (which is Fall Break for me)--For right now I'm trying to clear some time-sensitive stuff for class and clear the decks...more soon.

— Reply to this email directly or view it on GitHub https://github.com/ebeshero/mitford/issues/8#issuecomment-149035326.

ebeshero commented 8 years ago

@ezimmer Reading your earlier post: Here's how that might work with a tree map: Imagine three rectangles, for each of the top three categories.

We could position those relative to the Miss James node, so the closest rectangle to her is the most frequently associated and the furthest away is the least frequent.

Within the rectangles, we divide the space according to the most frequent and least frequent co-occurring reference--as a tree map does.

I'm not sure if the rectangles need to be the same sizes. Another way to handle the comparative frequency of co-occurrence might be to position the boxes in the same relative position (not plotting them by distance), but sizing the rectangles based on their co-occurrence with Miss James. That was what I had in my head to draw after Thursday!

We can probably try plotting it both ways and see what it looks like. I think for web interfaces, though, we might actually want something tidy and compact that can sit with other kinds of data on a web page. So: imagine bringing up a list of files that reference Miss James, together with some detailed information (in text) about the named entities who are represented in our visualization. It might make sense for this to share space with a window holding, on-click, a view of passages showing points of co-reference. Just some thoughts for designing a web interface around this!

ebeshero commented 8 years ago

@ezimmer Let me take a look at the permissions... or just move this into a conference-prep directory that we share. It's kind of hard for me to find it in my own queries directory because there's so much in it.

ezimmer commented 8 years ago

All of these thoughts sound terrific, and I definitely see the logic!

I also like the idea of the visual annotation appearing alongside text(s)--it sounds like that form of web rendering could be used with a list of texts, alongside an individual text, or both. (Does that fit with what you were thinking?)

Such an approach also chimes with the idea of "visual annotation": a form of graphic output that appears alongside texts or key items, and that provides immediate commentary re: recurring elements of contexts in which said key items are found.

Basically, the format you're describing sounds like a super-compact graphic rendering of archival insight, and I think that's what we're going for! (I also heard some terrific thoughts this weekend about evidence "hiding in plain view" that seem useful for our presentation.)

Perhaps it's best if I focus right now on the paper itself, based on these ideas, the visualization model you're outlining, and the larger contexts we'd discussed. (I'm doing it in "detailed outline" form first--we can all then work from that basis!)

If that sounds good, I'll forge ahead in that area.

Very excited about this!

@ebeshero @mollyodonnell

On Sun, Oct 18, 2015 at 2:10 PM, Elisa Beshero-Bondar < notifications@github.com> wrote:

@ezimmer https://github.com/ezimmer Reading your earlier post: Here's how that might work with a tree map: Imagine three rectangles, for each of the top three categories.

We could position those relative to the Miss James node, so the closest rectangle to her is the most frequently associated and the furthest away is the least frequent.

Within the rectangles, we divide the space according to the most frequent and least frequent co-occurring reference.

I'm not sure if the rectangles need to be the same sizes. Another way to handle the comparative frequency of co-occurrence might be to position the boxes in the same relative position (not plotting them by distance), but sizing the rectangles based on their co-occurrence with Miss James. That was what I had in my head to draw after Thursday!

We can probably try plotting it both ways and see what it looks like. I think for web interfaces, though, we might actually want something tidy and compact that can sit with other kinds of data on a web page. So: imagine bringing up a list of files that reference Miss James, together with some detailed information (in text) about the named entities who are represented in our visualization. It might make sense for this to share space with a window holding, on-click, a view of passages showing points of co-reference. Just some thoughts for designing a web interface around this!

— Reply to this email directly or view it on GitHub https://github.com/ebeshero/mitford/issues/8#issuecomment-149035969.

ezimmer commented 8 years ago

@ebeshero Thank you--please don't spend too much time on that right now, though! It sounds like you have a ton to do, and there's more than enough for me to work on in other areas.

On Sun, Oct 18, 2015 at 2:15 PM, Elisa Beshero-Bondar < notifications@github.com> wrote:

@ezimmer https://github.com/ezimmer Let me take a look at the permissions... or just move this into a conference-prep directory that we share. It's kind of hard for me to find it in my own queries directory because there's so much in it.

— Reply to this email directly or view it on GitHub https://github.com/ebeshero/mitford/issues/8#issuecomment-149036215.

ebeshero commented 8 years ago

@ezimmer I've created a new directory called "AnnotationToolQs" and I've made you the "group" that accesses it so we can both use it.

Here's a view of the output from eXist: http://dxcvm05.psc.edu:8080/exist/rest/db/output/tester.html

Just an HTML chart for the moment, and just the Persons output.

ebeshero commented 8 years ago

@ezimmer It's funny how I only see your long posts AFTER the short ones! (lol). I'll quote inline here:

"I also like the idea of the visual annotation appearing alongside text(s)--it sounds like that form of web rendering could be used with a list of texts, alongside an individual text, or both. (Does that fit with what you were thinking?)"

YES!! :+1: That's the idea. :-)

Such an approach also chimes with the idea of "visual annotation": a form of graphic output that appears alongside texts or key items, and that provides immediate commentary re: recurring elements of contexts in which said key items are found.

Basically, the format you're describing sounds like a super-compact graphic rendering of archival insight, and I think that's what we're going for! (I also heard some terrific thoughts this weekend about evidence "hiding in plain view" that seem useful for our presentation.)

Perhaps it's best if I focus right now on the paper itself, based on these ideas, the visualization model you're outlining, and the larger contexts we'd discussed. (I'm doing it in "detailed outline" form first--we can all then work from that basis!)

If that sounds good, I'll forge ahead in that area.

**Yes--that's a great plan! I'll concentrate on generating some visuals! And I can do the general intro to Mitford, and handle some of the details on how we're producing our graphs, though you and I can probably talk about that together (and we'll want to prepare for that together when we're all in Lyon).

Very excited about this!"

**me too! :-)

ezimmer commented 8 years ago

Thanks, @ebeshero ! persName seems the most salient category, so it's a great one to start with, too.