RJP43 / CitySlaveGirls

The Restoration of Nell Nelson
http://nelson.newtfire.org
5 stars 4 forks source link

Project Orientation #7

Closed RJP43 closed 8 years ago

RJP43 commented 8 years ago

@KariWomack @spadafour @CodyKarch @rCarls

Please refer to our very frst Wiki.

I hope this helps each of you find your way around this repository.

Any Questions post here!!

RJP43 commented 8 years ago

@KariWomack A lot of the information we have right now in the file names will be moved into the TEI Headers of the individual files. In the meantime you can go forward with posting in this issue a systematic way of labeling the different source files so that we have shorter file and folder names. Don't make the changes just yet on the actual files because I still need to pull some of that information into the headers before it is gone and moved over to your better file naming system. In order to receive your 10 pts for this week your task is to leave a comment here listing your suggestions for each of the folder names and a system for naming each source types' files. If you encounter issues comment in this issue so the instructors can take note of your efforts in completing this task. Thanks.

RJP43 commented 8 years ago

Great News!!! @abrennr completed the OCR of our third source. We will have to review the files and do regex to clean up the extra characters and general wonky-ness that comes in with OCR text. @KariWomack @spadafour @rCarls @CodyKarch check out the new files here --- notice how for every .pdf file there is now a corresponding .txt file! You will want to consider our "new" source while developing TEI tags.

ebeshero commented 8 years ago

@RJP43 @KariWomack @ghbondar I just took part in a command-line "boot camp" here in Pittsburgh, working with Pittsburgh Supercomputer space--and I learned how to apply regular expression patterns to match on file names and change them in their file directories. So, based on what I learned, I know we can quickly change file names by matching on a particular pattern or series of characters (like we do with any regex matching), and we run some commands to loop through a directory to change those files in any way we designate.

SO, think about regex or that consistent patterns we can remove in old file names: Can you identify some patterns? And what simpler names will make the most functional and human-readable sense for the project?

KariWomack commented 8 years ago

Okay, exactly how many of these files need to be renamed? The reason I am asking is because I was wondering if certain things like tables and graphs would be part of other files, or if those need to also be systematically renamed on their own. Also, how do you feel about the commentary file names looking something like: NComm8-21 instead of Nell Nelson_8-21-1888_Commentary. Since all of the commentary files have the same year, al we need to make note of is the day and month, N for Nelson, and Comm for commentary. I also figured we could use the same for the articles but substitute Comm for Art. Comments? @ebeshero

ebeshero commented 8 years ago

So, I'm meeting with @RJP43 and @ghbondar now, and we're thinking maybe we want to keep the full date in case we want to add 1889 or 1890 articles later. But we can definitely shorten the names and that's a really good idea. I will suggest perhaps foregrounding the newspaper title in the filename as a standard way to put forward the publication medium: Eventually you may be adding articles from OTHER newspapers (say the New York World), so you want to be able to tell instantly what the source is from the file name, just to make your project development life a little easier!

Think about sorting the files by year in the file directory:

1888-01-25-ChTimes.xml

1890-07-29-NYWorld.xml

We profs recommend changing the PDF file names to match, so you can instantly correlate them.

ebeshero commented 8 years ago

Book publications: keep it simple? Try for the books:

BarkleyS01 BarkelyS02 BarkleyS03 ... BarkelyS10 BarkelyS11 etc (The Barkley pub isn't divided into chapters, and has more than 10 sections. Make your numbering go: 01 ... 10...20...30 so they can be sorted.)

VS.

McEnnisC01 McEnnisC02 McEnnisC03 ... McEnnisC39

RJP43 commented 8 years ago

In order to get credit for participation in last week's and this week's project development I would like each of you @spadafour @rCarls @CodyKarch @KariWomack to read one article from the PDF images of the original articles and try your hand at transcribing it. In oXygen open a new XML document and give the document a root element of div type="article" be sure to leave a comment tag or feel free to begin creating a basic TEI header with the publication date included inside. Separate the headlines from the main body of the article with div elements setting the attributes to @type="headlines" and @type="body". Use the self-closing <gap/> element and either a comment tag or logical attribute to indicate words that are difficult to transcribe and a reason as of why. Separate paragraphs with the <p> elements and if advertisements for future newspaper issues related to the Nelson series follow the main body of the article separate those into a separate <div type="advertisement">. Mark with comments (that include your name and date) areas of interest and parts of the article that stand out to you and then push your finished transcription to the OriginalArticle_XML Folder using your desktop client. Once you have completed your transcription (not an easy task so give yourself sufficient time) go into the Anon.WhiteSlaveGirls Folder and hunt for the corresponding Barkley Section (you can best do this by skimming the headlines of the sections in search of headlines that match those from your article). Review your section in comparison with you article and jot down any noticeable differences. Once you have completed all of these tasks comment in Issue 9 giving us which section from the Barkley text a.k.a. Anon.WhiteSlaveGirls corresponds with the article you transcribed. This will be significant project work helping transcribe, begin basic TEI structure tagging, and versioning. This will also give each of you a chance to become better associated with the Nelson project and begin the process of finding interesting things each of you may want to produce data visualizations on for upcoming assignments.

Article Assignments: @spadafour --- 8/6/1888 @rCarls --- 8/7/1888 @CodyKarch --- 8/8/1888 @KariWomack --- 8/9/1888

Please contact me via email or in this issue with any questions and concerns as they arise. Thank you!

RJP43 commented 8 years ago

I am going to make a new issue better directing how to complete the tasks laid out in the above comment!