DigitalMitford / DM_SiteIndex

a repository for development of our prosopography "site index" file
https://digitalmitford.github.io/DM_SiteIndex/
GNU Affero General Public License v3.0
0 stars 0 forks source link

Review Occupation types and subtypes #1

Open ebeshero opened 6 years ago

ebeshero commented 6 years ago

For new schema development, we need to review the current proposed list to streamline occupation encoding in the site index. That list is posted on the Documentation site here:

https://digitalmitford.github.io/DM_documentation/SI_Occupations_Guide.md

@lmwilson @Samwebb64 @KellieDC @ghbondar

ebeshero commented 6 years ago

So, this is an example of an "issue" or "ticket" we file in a GitHub repo. It's analogous (and an improvement) on the Box comments we'd write. It's an improvement because you can edit your posts. You can link to files in the GitHub repo (click on the code tab) if you need to, and you can link out to other things, and (as you see above) we can ping members of our team by their GitHub handles.

ebeshero commented 6 years ago

If you're reading these posts in your e-mail, know that they are coming from a GitHub "Issues" tab on one of our Digital Mitford GitHub repos. You can see what the post looks like at its source by scrolling to the end of the e-mail--at the bottom you should see a line that reads:

You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

And if you click on the "view it on GitHub" link, you go straight to the issues. (The link isn't present in my snip of a quote above.)

Samwebb64 commented 6 years ago

@ebeshero @lmwilson Are you guys using a specific reference source to collect occupation names? I want to make sure I use the same one to identify the ones I'm finding.

ebeshero commented 6 years ago

Hi @Samwebb64 ! @lmwilson did a lot of this by looking at census categories, and a period pamphlet listing social classes, and the two of us consulted WorldCat, Library of Congress headings, Wikipedia headings. We didn't find a good, single standard authority file for this. Also, we're agreed on a philosophy of "less is more", so we looked for ways to group and cluster occupations conceptually-- capacious terms that work semantically to group 19c occupations that we've seen in MRM's world. And we wrote a few in camelCase (like 'seaCaptain') so none of them contain white spaces. The reason for that is, we want to make it possible to code multiple subtypes with a type, separated by a white space.

lmwilson commented 6 years ago

Can you generate a list for us of all the occupation types currently in use? I think I would like to check that again.

I realized when I went through si-add-chas1-findens-rienzi that I might have missed a few important ones:

  1. Rector is different from vicAR in c of e. (Unless we want to put all titles in roleName, like Vicar of Dibley, Rector of Ashe, etc.) Have we thought through the pros and cons of that method in terms of using it to analyze data? Will that work the way we want?
  2. We don't have a subtype=professor for academics from NOW. So I cobbled together 3 types: educator/scholar/etc. Maybe that is fine because we are using professor of x in roleName. I had used "orator" to capture Dickens's work as a public lecturer. I'm not sure whether that should be under type=literary or type=theater. But it will come up for others, like Coleridge. So I think we need something.
  3. We also don't have a good solution right now for law enforcement (constable? Government or law?) or positions like magistrate or Sherriff (Sherriff positions were mostly elected and office holders don't need law degeees.) Once again, are these admin. or legal.
  4. Sam pointed out there are a bunch of occupations elected by the local church vestry, especially in c of e: sexton, churchwarden, pew opener, etc. In protestant churches there are deacons and elders etc. These are non clergy who do administrative and clergy-adjacent jobs for the local church.
  5. We also want to be sensitive to the kinds of unpaid work that exist, particularly for the women. A number could go under "benefactor" but we may want additional subtypes. This gets into a whole issue about 'invisible labor' of the kind mostly women do....sewing for themselves or to make charitable items for sale or both, for example. Can we try to reflect that without completely reproducing the logic that that work doesn't "count"? As an example, think about Caddie and Mrs. Jellyby in Dickens. Sometimes with real or fictional people that kind of work is still unknown because it's unrepresented. But I think it's worth trying....Would be a fairly unique feature of DM!

Another general issue. Sam and I had discussed whether we would ever want to somehow try to use the occupation data or some other way to be able to gauge the numbers of members of different social classes. We don't yet have a way to do that and maybe there isn't one. Possibly we could use a general heading under occupation of socialClass and then use something like royal, aristocrat, landed gentry, artisan/tradesman, (tenant farmer?), etc. Some categories might be fluid over someone's lifetime, like George Mitford.

I talked to Sam and brought her up to speed on what we have done so far. She is going to put her compiled occupations list and notes in box in the si IP folder we have there. That made the most sense because we agreed to keep the very drafty things in the box folders.

ebeshero commented 6 years ago

@lmwilson : That list is posted here (on the DM_documentation repo): https://github.com/DigitalMitford/DM_documentation/blob/master/SI_Occupations_Guide.md

@Samwebb64

ebeshero commented 6 years ago

About the questions: Here’s my two cents:

1) When we were meeting and working on the occupations coding last week, you and I tried to stick to this guiding idea: The @type attribute is required on the <occupation> element, and it is broadly topical and most likely the most useful category for our analysis. So for the rector vs. vicar question it doesn’t seem like this matters nearly as much as the simple distinction of @type="religious". The @subtype list is secondary, optional, and for a given entry can be multiple, so if you wanted to, for someone who was both a rector and a vicar and different points in life, we could encode

<occupation type="religious" subtype="rector vicar"/>

And we can add a new occupation element if this person was also occupied as @type="explorer".

So, how important are the subtype codes? We decided pretty strongly that these should NOT be the same as what we code with <roleName>, so this shouldn’t become too specific. These should be distinct words that convey categories.

So, the question is, is rector a strong categorical distinction on the level of vicar and minister? Is vicar a strong categorical distinction from the other two? I think we wanted a sense of distinction among levels of commitment to a community or church. If we should add rector, let’s do it but be clear about why we need the subcategory.

ebeshero commented 6 years ago
  1. We may want to use the occupation encoding on members of every list after all (our editing team as well as people in MRM’s world, and I think adding a general subtype category of "professor" could work for anyone employed in various faculty positions at colleges and universities as institutions of “higher education”. Works for me, and this will help with speeding updates to the big SI. We can add if you agree, too.

Currently, "orator" is only a subtype under type="government", because this general category includes political and social reformers. We talked a bit about adding it to other categories—where else does it need to be?

ebeshero commented 6 years ago
  1. Doesn’t law enforcement belong in the type="government" category? Why don’t we do a simple camelCase string that will lump sheriffs and constables like so:

<occupation type="government" subtype="lawEnforce"/>

You’ll see a few other camelCase solutions like this in our subtype lists, too.

Hmm. I can see an argument for putting law enforcement under type="legal", but it needs to be in one OR the other type category, not both. What do you think?

ebeshero commented 6 years ago
  1. As for “clergy adjacent”, let’s come up with a good general term for the wide variety of position names here. subtype="churchAssist"?
ebeshero commented 6 years ago
  1. Great idea! I guess expanding the tiny category of benefactor makes sense here, if this constitutes unpaid effort. What about these new subtypes?

domestic volunteer

?

ebeshero commented 6 years ago

Okay, last point from your post: Being able to track class mobility might be interesting, but maybe we can do that already if people are engaged in multiple occupation types? Anyone who turns up in trade as well as legal for example, might be a person of interest.

Samwebb64 commented 6 years ago

@ebeshero @lmwilson I just posted the occupations list in the SI IP folder in Box.

As for these points, here's what I think.

  1. I agree with these broad categories, and the "less is more" philosophy. The only category that seemed to be missing was an outlier - what we might call "freelance" or non-waged trades. But these can be included into trades, I suppose.

  2. On my list, I added subcategories to the trade. Maybe these can be used as s, in the same way we might use 3. ? Lisa and I also discussed that could apply to , so writers who also gave lectures, like Coleridge or Dickens.

  3. After checking on the job of a constable, I'm fairly sure that should go under the . I didn't check "sheriff" but it seems clear the role of constable flows initially from Charles II. But that's my interpretation. (Canadians and our good government Constitution. :-)

  4. I'm in favor of going with religious that are specific to the position. I think we'll encounter many religions and positions, so if we can capture many of them, this allows us to accommodate them.

  5. I like the idea of for unpaid female labor doing sewing, painting, teaching in families. In some senses, these women would have been thought of as "dependents," so this is an interesting spin, and true to the spirit of a number of sketches.

One thing to keep in mind is that, in OV, many of the characters with occupations are actually identified only BY their occupations; they have no actual personal names (except occasionally some are mentioned in passing). I'm not sure if this will throw a wrench into these occupation tags, since they will also be tags. But s will have a redundancy, as in: The rector is the rector in OV...

ebeshero commented 6 years ago

Thanks, @Samwebb64 ! I'm reading your list in Box now and thinking of ways to blend it with the list here in the repo.

 <person xml:id="butcherBoy_OV">
                 <persName> 
                    <roleName>butcher's boy</roleName>
               </persName>
               <occupation type="trade" subtype="butcher"/>
 </person>

See how that can work to use the <roleName> to complement the occupation element?

@ebeshero Did we implement this or not yet? Some of the same questions came up again from the student sweep through.

lmwilson commented 6 years ago

Something of a separate but related issue: If we are going to use roleName to capture things instead of occupation, that certainly works for searching and analyzing the SI. But how are we going to use or show those in the Web mouse over output? And how many do we include? It can become an issue of length again when we are talking about some of the higher ranking aristocrats, clergy, politicians, and scholars. Some of them have 15 or 20 or more titles and names. Scholars might be Fellows of all those royal societies. Somebody like Cardinal Richelieu might hold lots of titles in government and clergy. Somebody like geo. I V has a bunch of titles and memberships. If we are not putting all of them in the entries, we need rules on which to include. I have been doing a loose less is more method and only including important ones, but unsystematically.

E--is there a simple way to generate a list of all the occupation tags we have in the SI files currently? I think I need to check that again and more systematically. (Not the proposed new list, the entire current one.).

Get Outlook for Android


From: Elisa Beshero-Bondar notifications@github.com Sent: Thursday, August 16, 2018 9:18:55 PM To: DigitalMitford/DM_SiteIndex Cc: Lisa M. Wilson; Mention Subject: Re: [DigitalMitford/DM_SiteIndex] Review Occupation types and subtypes (#1)

Thanks, @Samwebb64https://github.com/Samwebb64 ! I'm reading your list in Box now and thinking of ways to blend it with the list here in the repo.

See how that can work to use the to complement the occupation element?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/DigitalMitford/DM_SiteIndex/issues/1#issuecomment-413729539, or mute the threadhttps://github.com/notifications/unsubscribe-auth/Al0QtCx-hfYnBo1V3yD7IElqE2R9U304ks5uRhn_gaJpZM4V4zQy.

jamesrovira commented 6 years ago

For what it's worth --

I like the subcategory method. It seems intuitive and organized, but flexible enough to capture everything.

I tend to think in terms of providing more detail rather than less, keeping in mind that future researchers could ask any conceivable question. Someone might ask at some time, for example, "Is there a pattern of difference between uses of 'rector' and 'vicar'?" Differences might be associated with rank, family, geography, etc.

I think Lisa points out a real problem with potential unwieldiness for individuals with multiple titles. We might consider that some titles are subordinate to or folded into others (probably especially the case with royalty or a Cardinal), so perhaps focus on the main title, and list the rest at the bottom of the entry: "Also holds the titles..." Ideally, we would emphasize different titles depending on context: being "Cardinal" has primary importance in most contexts, but perhaps if the Cardinal is a member of some scientific society, his title within that society might be important too. That would only work if we customize notes or popups for each document.

Jim

On Fri, Aug 17, 2018 at 9:47 AM Dr. Lisa M. Wilson notifications@github.com wrote:

Something of a separate but related issue: If we are going to use roleName to capture things instead of occupation, that certainly works for searching and analyzing the SI. But how are we going to use or show those in the Web mouse over output? And how many do we include? It can become an issue of length again when we are talking about some of the higher ranking aristocrats, clergy, politicians, and scholars. Some of them have 15 or 20 or more titles and names. Scholars might be Fellows of all those royal societies. Somebody like Cardinal Richelieu might hold lots of titles in government and clergy. Somebody like geo. I V has a bunch of titles and memberships. If we are not putting all of them in the entries, we need rules on which to include. I have been doing a loose less is more method and only including important ones, but unsystematically.

E--is there a simple way to generate a list of all the occupation tags we have in the SI files currently? I think I need to check that again and more systematically. (Not the proposed new list, the entire current one.).

Get Outlook for Android


From: Elisa Beshero-Bondar notifications@github.com Sent: Thursday, August 16, 2018 9:18:55 PM To: DigitalMitford/DM_SiteIndex Cc: Lisa M. Wilson; Mention Subject: Re: [DigitalMitford/DM_SiteIndex] Review Occupation types and subtypes (#1)

Thanks, @Samwebb64https://github.com/Samwebb64 ! I'm reading your list in Box now and thinking of ways to blend it with the list here in the repo.

  • I'm likely to try to lump some of the subcategories together--where you have "butchersBoy" and "apprentFootman" just indicate the occupation subtype as "butcher" and "footman" because that's a topical area of work. (Alternatively, we should be consistent about identifying the "work area" at the start of the entry and the "understudy" aspect second at the end: "butchersBoy" and "footmanApprent". But I think to simplify, the occupation element doesn't have to carry precise details that will come through in the rest of our entry.

  • Can we think of a way to simplify kinds of religious positions, so as to distinguish titles (specific to distinct communities) from functions? If we can keep the subcategories simple, and keep in mind that we can correlate the element information with specific titles given in

    we will reduce redundancy in our tagging.
  • You raise a good point about such redundancy related to the OV characters who aren't named. In this case, we have an opportunity to use simpler/broader category words in the element to pair up with a person whose only name is essentially a . We may, indeed, want to structure such OV entries like this:

butcher's boy

See how that can work to use the to complement the occupation element?

  • Shall I make a try of reconciling your list, Sam, with the list Lisa and I hammered out? And then let's review it together?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub< https://github.com/DigitalMitford/DM_SiteIndex/issues/1#issuecomment-413729539>, or mute the thread< https://github.com/notifications/unsubscribe-auth/Al0QtCx-hfYnBo1V3yD7IElqE2R9U304ks5uRhn_gaJpZM4V4zQy

.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/DigitalMitford/DM_SiteIndex/issues/1#issuecomment-413871474, or mute the thread https://github.com/notifications/unsubscribe-auth/Amn1TpbLN9cm-d0Vr1pnC88QgMq_KeWYks5uRsmGgaJpZM4V4zQy .

-- Dr. James Rovira http://www.jamesrovira.com/

Active CFPs

ebeshero commented 6 years ago

@lmwilson @jamesrovira @Samwebb64 The <roleName> element is inside the<persName> elements in the SI. The first <persName> element is the one we post for identifying the individual in the mouseover notes. Sometimes that first persName element contains several <roleName> elements, especially for aristocrats or royalty, and on mouseover of a name in a Mitford file on the website, those are what we see first after their most recognized name, followed by the info on birth and death dates as stored on the <birth> and <death> elements. I am pretty sure we are not outputting the <occupation>elements at all in the mouseover notes, but going directly to the <note> element because this is what our readers need to see to quickly identify a name or reference.

So, how would we use the <occupation> element? Remember, our project tracks information that can be analyzed and published in a much wider range of ways than the pop up annotations on the letters. We are preparing to post more aggregate data from the SI, and we can produce charts and graphs indicating the numbers of individuals associated with occupation categories and subcategories. Here, we don’t want too many subcategories to choose from—we probably don’t want a lot of solo outliers: lumping rather than splitting should be the rule here for our coding team, too.

We can correlate the <roleName> elements when present with the lumping single-word topical categories of the types and subtypes on the <occupation> element—so in this way, individuality and specificity is not lost. We can generate an output that lists all the <roleNames> connected with Trade or Government occupations in the SI, for example. And that is why I say we can think of <roleName> as supplying the individualized detail on occupations that we don’t wish to lose.

This is really to help us with aggregate processing: We can output lists of all the people engaged in X trade or in domestic service, and make charts indicating the numbers and representation of various occupations in MRM’s real world vs. perhaps the world of OV. It’s an aid to aggregate research—and to me it is pretty exciting to be able to do this—more systematically than we have been. Does that help to convey what we are hoping to accomplish with the controlled vocabulary of types and subtypes on the occupation element?

@lmwilson Yes: We can quickly output the wild and uncategorized list of occupation element contents in the SI—I’ll do it and post today, but you can see it quickly for yourself using the XPath window in oXygen: With the current SI file open, paste this XPath expression in that window:

//occupation

To review, this simple expression says, “Start at the document node (above the TEI root element) and look down the entire descendant axis (all the children and children’s children deep down through the entire XML tree) and locate every occupation element. If you enter that in the oXygen XPath window, you will see and be able to scroll through a list of results in the bottom window.

ebeshero commented 6 years ago

Just proofed and corrected my post a bit—go read on GitHub rather than email. :-)

ebeshero commented 6 years ago

@lmwilson Here is a table containing the contents of each distinct occupation element in the current SI. I've output it as a numbered table, together with a count of all the times this value appears, and an output list of the first <persName> element in the <person> entry that contains this occupation value.

https://digitalmitford.github.io/DM_documentation/SI_currentOccupationsTable.html

ebeshero commented 6 years ago

I've just refreshed the output to sort it by count (of the number of times each value actually appears in the SI), and I updated the explanation at the top.

You'll see we have quite a lot of "one-off" values, which is motivating us to make the occupations lists more systematic now! I favor the idea of correlating roleName with occupation when we examine this information, because it frees us from having to use every precise word available in a specific context for a kind of occupation. Hmm. Maybe I'll output another column in the table to show the roleName elements associated with each occupation value.

Of course in our current (=old/original) system, we didn't attempt to make subtypes. And yes, we'll have lots of work to apply a new tagging system for occupations retroactively, but for this we make special schemas for Site Index editing. That and GitHub coordination will help us share the work with more than one person at a time.

ebeshero commented 6 years ago

And I've now added roleNames, where they were available, so we can see how these might correlate.

ebeshero commented 6 years ago

Be sure to refresh your browser--wait for new stuff to come up. Sometimes GitHub pages takes a few minutes to complete an update: https://digitalmitford.github.io/DM_documentation/SI_currentOccupationsTable.html

lmwilson commented 6 years ago

Thanks! I just wanted to do a double check of my previous sweep and couldnt recall the process--I forgot the //. I had earlier made one pass through all the historical people condensing the numbers of occupation categories and mostly eliminating one offs (these are found in those two si-add a to h and h to z revised files), but those changes did not make it yet into the formal SI. So some of them have been condensed once already. Now, of course, we will need to make revisions to the revisions (but we should be able to do it in those previously revised lists, at least for historical people.)

Get Outlook for Android


From: Elisa Beshero-Bondar notifications@github.com Sent: Saturday, August 18, 2018 11:25:59 AM To: DigitalMitford/DM_SiteIndex Cc: Lisa M. Wilson; Mention Subject: Re: [DigitalMitford/DM_SiteIndex] Review Occupation types and subtypes (#1)

I've just refreshed the output to sort it by count (of the number of times each value actually appears in the SI), and I updated the explanation at the top.

You'll see we have quite a lot of "one-off" values, which is motivating us to make the occupations lists more systematic now! I favor the idea of correlating roleName with occupation when we examine this information, because it frees us from having to use every precise word available in a specific context for a kind of occupation. Hmm. Maybe I'll output another column in the table to show the roleName elements associated with each occupation value.

Of course in our current (=old/original) system, we didn't attempt to make subtypes. And yes, we'll have lots of work to apply the tagging retroactively, but for this we make special schemas for Site Index editing. That and GitHub coordination will help us share the work with more than one person at a time.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/DigitalMitford/DM_SiteIndex/issues/1#issuecomment-414065756, or mute the threadhttps://github.com/notifications/unsubscribe-auth/Al0QtGmIXhmott4m_nfiUeGQonJkMUMaks5uSDIHgaJpZM4V4zQy.

lmwilson commented 6 years ago

Good idea!

Get Outlook for Android


From: Elisa Beshero-Bondar notifications@github.com Sent: Saturday, August 18, 2018 11:30:20 AM To: DigitalMitford/DM_SiteIndex Cc: Lisa M. Wilson; Mention Subject: Re: [DigitalMitford/DM_SiteIndex] Review Occupation types and subtypes (#1)

And I've now added roleNames, where they were available, so we can see how these might correlate.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/DigitalMitford/DM_SiteIndex/issues/1#issuecomment-414066095, or mute the threadhttps://github.com/notifications/unsubscribe-auth/Al0QtGYA5RTmL79hoRg5XUgVPsT6YmDHks5uSDMMgaJpZM4V4zQy.

ebeshero commented 6 years ago

Just tried setting those tick marks (``) around your axis step (//`) in your post to make it stand out as code --apparently it doesn't work when the response comes by email!

lmwilson commented 5 years ago
  1. Doesn’t law enforcement belong in the type="government" category? Why don’t we do a simple camelCase string that will lump sheriffs and constables like so:

<occupation type="government" subtype="lawEnforce"/>

You’ll see a few other camelCase solutions like this in our subtype lists, too.

Hmm. I can see an argument for putting law enforcement under type="legal", but it needs to be in one OR the other type category, not both. What do you think?

@ebeshero Elisa--Here is the discussion thread between you, me and Sam from last August on some of the grey areas in the Site Index Occupations: including what to do with unpaid labor, police, etc. It looks like we came to some good conclusions here but have not yet implemented them in the occupations list.

ebeshero commented 5 years ago

@lmwilson Looking back on this, if we are considering a lumping of constables and sheriffs and police as a general group of "law enforcement officials", can we simply use our @type="legal" and apply the simple @subtype="enforcement"?