cacology / BSA-prototypes

A bunch of prototypes for BSA "digital humanities tools" including the website.
2 stars 1 forks source link

Will Independent Scholars Contribute Raw Data? #3

Open cacology opened 5 years ago

cacology commented 5 years ago

Problem

Can we convince independent scholars to contribute materials to the BSA that aren't "ready" and may never be?

Solution

  1. Dump it on GitHub?
  2. BibSoc?
  3. Something else?

Discussion

James is going to try to convince one independent scholar to contribute his transcriptions and Erin is going to try to convince another. Let's see what happens!

erin-elizabeth commented 5 years ago

@cacology how should we follow up with Bill following your RBS convo? I will ask Donna for the BibSite agreement and can amend for our purposes if necessary. I can't remember if we and @jeremyboggs decided that agreements were unnecessary for now.

I will follow up with David Levy about whatever portion of his data he would be willing to share.

cacology commented 5 years ago

@erin-elizabeth I'd like to give him a week to get settled. He seemed eager to share based on my description, but I think you're right that we want to consider following up.

Summary of What I Learned So Far

Bill was extremely willing, verbally, to share his data and asked for my email. He's an independent scholar and doesn't have a particular plan for his work, so I think sharing it seemed like a win to him. We'll see what happens with the follow-up and with time.

David Levy was more nervous, Erin and I both spoke with him, because he wasn't sure what we were trying to do. I think this is because he has his own ambitious plans for bibliographical research. He's already been doing his own web-based publication, so I think that this was less appealing to him. Viz. http://booksongaming.com/hoyle/index.htm and http://booksongaming.com/hoyle/bibliography/index.xml

I think the lesson here is that we have to provide researchers with a clear value in contributing. For Bill, this seems to be exposure because he's not a sophisticated web user. David is fairly sophisticated already, so didn't see the value in contributing right away.

In addition to the agreement, we also might think about how contributing to the data sets can be valuable to the individual researchers.

Questions

What value does sharing with BSA give researchers?

How do we explain that value?

How do we create more value for them?

erin-elizabeth commented 5 years ago

@cacology You're right – David's data is already out there, but I'm not sure if he was disinterested in working with us. After you left our conversation we continued to chat, and discussed the possibility of a skilled programmer helping to develop his software so that it can deal with a wider variety of books. I think that this project can offer that value to David (or someone like him), at least theoretically. Here's why –

In my thinking about the GitHub repo as a facilitator for prototyping, having David's software available to someone who could work with it (without screwing it up, of course) could be an asset to him, and also be an asset to the BSA platform in demonstrating that there are a variety of possibilities (and outcomes) for digital humanities work via a BSA network.

Thinking really forwardly, this could mean that we worked with funders to establish fellowships for data analysis/digital humanities and software development, which I think makes the project more generally ambitious and interesting.

Responding To Those Questions

What value does sharing with BSA give researchers? – For some people, I think that there will be value in simply helping a learned Society achieve its goals. This is going to be especially true and an important part of our pitch, I think, in early stages because we can't promise any real outcomes right now. We need data because we need something to work with to see if the program we are imagining will really work.

How do we explain that value? As Executive Director, my pitch to people about wanting their involvement in the org has a lot to do with the fact that BSA is in transition and fairly agile. The value of helping us right now – sort of as a donor, who might see their personal objectives met by sharing data – is in playing a role in shifting the character and relevance of a very old organization to ensure its survival into the future. It's a big ideas sort of pitch but I think some people get excited by it.

How do we create more value for them? This is going to come from finding DH people to start playing with data, which I see as a second step after we get a bunch of folders full of stuff to play with. I have just established a relationship with Mary Catherine Kinnibrugh who is connected to DH at CUNY so I'm excited to pursue that, and wonder also if @cacology logy, @jeremyboggs or others at the UVA Scholars Lab might be willing to help us connect people with data sets that will appeal to him. We need those people, right?

cacology commented 5 years ago

David provides some food for thought on email:

From: David Levy david@booksongaming.com To: 'James P. Ascher' jpa4q@virginia.edu, 'Erin Schreiner' erin.schreiner@bibsocamer.org Subject: RE: Data for BSA GitHub Flags: seen Date: Tue Aug 6 13:06:25 2019 Maildir: /jpa4q@virginia.edu/INBOX

Dear Erin and James,

Good seeing you both in Charlottesville. It's great to be back into the SF fog and avoiding heat, humidity, and air-conditioning!

Yes, what James said.

I am indifferent to sharing the data underlying my bibliography in the BSA repository. It's already out there, freely available at http://booksongaming.com/hoyle/bibliography/index.xml and managed in my private git repository. I am making changes with great frequency and to update two places would be awkward both for me and for anyone using the data. And I struggle with who might want to use the data. I do get emails about the bibliography from time to time; they are invariably from dealers or collectors and not scholars or bibliographers. What might be interesting is for BSA to host the website when I am "done," whatever that means. It would be easy to create an html (rather than xml) version of the website and move or copy it from my private booksongaming.com domain to something more permanent such as the BSA website. The project that interests me the most would be hosting a small subset of my software, rather than the data, in the BSA repository. The useful bit is the collation statement, something I think any bibliographer would find useful. I started with the approach of the TEI working group on physical bibliography. They did not get far, but had a very thoughtful start on the collation formula. I took it very much farther, though much more remains to be done. My writeup is at http://edmondhoyle.blogspot.com/2015/09/hoyle-bibliography-technology-update.html. Erin, please take a peek at that short essay--it will set context for the notes below. Some thoughts about where the software is and where it could go: (1) The value in my software is the collation and pagination statement: proper grammar, sensible domains for the variables, consistency checking, and presentation in HTML and MS Word format. Any bibliographer comfortable with an XML editor could use the software. (2) I developed the software in a "demand-driven" way. A "supply-driven" approach would have been to go through Bowers and allow for all the complexity therein. I didn't do that; instead, I focused on the books in front of me. Primarily English books from 1742-1868 and American books from 1796-1840. There are a lot of books in DescBib labs that would require additions to the software. Example: books with a pepper signature mark. I never saw one and never bothered to make that a valid signature mark. Trivial change. Example: folios that collate A-5M^2, I never saw more than two alphabet runs and didn't bother to write the extra code to allow third and subsequent runs. More than a trivial change, but not difficult. Etc. If the software were in a BSA repository as is, developers who needed to enhance it for their books could feed the changes back into the repository and the tool would become more generally useful. The skill set for updating the software includes XML schema, XML, XSLT, and Python. (3) There would have to be some documentation of the software. What is it? How do you use it? How do you extend it? What tools do you need to make it work? If the project got going, I'd be happy to work on that, perhaps with someone who actually wanted to use it. (4) The biggest technical stumbling block for me in putting the software out there is separately managing the collation code and a user's project-specific code in a git repository. I wouldn't want anyone to update my Hoylish bibliographic descriptions, but would be thrilled for others to extend the collation software. This means managing the software in a public git repository and the descriptions in a second private repository. Git provides a mechanism for doing that ("git subprojects"), but I haven't been able to make them work as advertised in the time I'm willing to spend. I'm certain a more experience git person could make it work very, very quickly. (5) The biggest non-technical stumbling block is finding a person who wanted to use the software in a serious way. This is consistent with my demand-, rather than supply-driven approach to software. If there is no demand and I am the only user, it's done! If there were another user or two or three, we could end up with something important. As I had discussed with James, I would have thought RBS was the right place to find users, but perhaps BSA would be a better choice. The Exlibris-L and Sharp-L did not turn up any candidates. (6) Another thought would be to publish a short article in BSA describing the software and inviting interest. Or giving a talk at some BSA event and inviting interest. I don't think I could take that on this year given other things I'm doing, but would be interested to hear if that sounds interesting.

Best,

David

-----Original Message----- From: James P. Ascher jpa4q@virginia.edu Sent: Tuesday, August 06, 2019 6:45 AM To: Erin Schreiner erin.schreiner@bibsocamer.org Cc: David Levy david@booksongaming.com Subject: Re: Data for BSA GitHub

Hi all,

Thanks Erin for connecting this and thanks David for being willing to ask the hard questions about the sorts of users we're aiming for!

I've thought a great deal about our conversation in the hallway in Alderman: We found one independent scholar who was eager to share--though time will tell if he does--but I think the value for him was the exposure. David, you are already very competent at getting your material out there, so it was less clear how BSA would provide any value to your research project.

This is what I'm meditating on: It's a clear win for the community to provide a range of data sets, but it's not as clear for the individual researchers who have the data. In David's case, he's actually already publishing his own! Is there some value we can offer to a scholar like David who already knows how to put together raw materials? Or, maybe we just point to his material already available? (My mind tends towards that, but it's because I've worked with David and know he's extremely committed to the technical development of his own research.)

David, you mentioned that finding someone to help develop your XML system into a general system would be motivating. Erin, I think we should think about how that might happen and what that would look like.

With appreciation to you both, -James

On Mon, Aug 05 2019, Erin Schreiner erin.schreiner@bibsocamer.org wrote:

Dear David and James, Per our discussion last week, we would be very glad to have access to a portion of your bibliographical data, David, as we explore new working models for a BibSite style https://bibsocamer.org/bibsite-home/ repository on GitHub.

I am still learning the way we're using GitHub, so I'm hopeful that the two of you can connect to iron out details regarding the data and upload – I'm just now sure how useful I can be in those conversations at this point in my learning process.

We're really grateful for whatever you might be able to share with us, David. I had an excellent meeting about discussions had with a variety of interested folks at RBS with Barbara Shailor, and she is fully support of our prototyping approach and this GetHub repo that we're working to set up. I think that this is the beginning of what could be a very meaningful project for BSA.

All the best to you both, Erin

Erin Schreiner Executive Director The Bibliographical Society of America erin.schreiner@bibsocamer.org www.bibsocamer.org *Click here http://mp.gg/63k1j to j*oin us or renew your BSA membership!

erin-elizabeth commented 5 years ago

Working from David's thoughts, Notes on a Possible Collaboration

(5) [The biggest non-technical stumbling block is finding a person who wanted to use the software in a serious way.] I can think of a group of users who would want to use this in a serious way: The Black Bibliography Project and the folks who are working on it. Here's why I think they're a great group to consider a collaboration with: They're actually interested in descriptive bibliography; the project initiated because of the failure of the BSA's Bibliography of American Literature to include any black authors except for 1.

Evidence of that interest: I will be leading a descriptive bibliography workshop with Jesse Erickson at their November conference at the Beinecke. This is only going to be a 2 hour workshop, however, and from what I understand they are looking to build a digital bibliography using linked data that will be populated with descriptive metadata for textual objects generated by a dispersed group of people who know black books but don't necessarily know descriptive bibliography. Per my discussions with project leaders Meredith McGill and Jackie Goldsby, the BSA's primary role here will be in facilitating des bib resources and education.

So: I think that BBP project leaders would immediately see the value in working to expand David's XML code so that catalogers have access to a tool that will generate and check their collational formulas.

The question of what is a black book is a rather complex one, but from a material point of view it's an incredibly varied pool – the code would need to be able to handle a huge range of books, which in my mind translates into the kinds of changes to the code that David mentioned in #2.

They have already received a planning grant from Mellon, and are hooked up with the Beinecke. The November conference is intended to help them gather ideas and structure their next grant proposal. BSA ties with the project are already well established, so we're well positioned to align these two projects if that's a desirable collaboration for you, David.

This troika – Yale, BSA, Mellon – means dedicated users, strong funding possibilities, and a demand-driven environment that could meet some of your goals.

We would need to work out a number of details – hosting, ownership/licensing, and perhaps an HTML interface for non-coders (like me) who would want to use the tool. In my mind those are easily resolvable issues, bridges to cross way down the road if it seems like a good idea to David in the first place.

(6) [Another thought would be to publish a short article in BSA describing the software and inviting interest. Or giving a talk at some BSA event and inviting interest. I don't think I could take that on this year given other things I'm doing, but would be interested to hear if that sounds interesting.] These are great ideas – since there's a lot to chew on under #5 I'll simply say that if BBP doesn't seem like the right option for you, David, we can pursue these other leads which could be fruitful.

erin-elizabeth commented 5 years ago

Quick update re: David Levy's data: He's willing to share via GitHub

I spoke with the Black Bibliography project about their interest in expanding his software for use in their project. There was some interest, and it will be discussed at the November Black Bibliography Project Conference at the Beinecke Library.