Open Jacksweeney1 opened 4 years ago
This week, we continued to learn more about GitHub, learning how to push and pull files from the server, as well as getting comfortable with the command line. At this point, we have all committed and pulled files from the server and are comfortable with our GitHub setup. We have also discussed our research more deeply, and decided that our research will focus not only on geographic comparisons, but temporal ones. This means we will be investigating changes in the song lyrics across the time period from the late 1970s through to 1991, as well as across the different Soviet republics and satellites. My group has collected a preliminary corpus of Russian songs that we might want to use for the project. I have also converted some of their song lyrics into xml files that we can use going forward for our first forays into mark-up. Speaking of markup, Misha has created some preliminary ideas for how we might quantify our idea of “dissident”-ness and what sorts of tags we will be wanting to use for this. We discussed further the subcategories we’ll use for this topic, and we will continue working on this in the upcoming week, since we will need to input these into the schema as elements/attributes. Our next goal is to come up with a preliminary structural schema for our texts. We are still in the early phases of our project, so it’s still rather slow going. But collecting some of our songs and creating a schema for them is an important step because it’s likely that we will have a lot of texts that we are working with, which will make a solid schema especially necessary. The schema will be our first step along the road to deciding what kind of reference tags we want to make and then marking our texts up with those tags.
This week, we began by discussing what would come next. James gave us a link on how to stay more organized. He also mentioned that we should take a two pronged attack on dealing with text- maybe a focus on both a large corpus and short group of specific songs. Maybe. James assigned me (Jack) to investigate if I could find a source of Mongolian or Tuvan texts that relate to our project. We then talked about what we should do as far as our research question should go. Looking at the dissidence file Misha whipped up last week, James spoke to us about Metadata. The outlook portion of our research can be contained in Metadata, while Aesopian Language would have to involve closer reading. I think maybe we should write up a list of terms for 'distant reading'?Furthermore, over this next week we will be marking up an XML doc to get the ball rolling on our project. Misha brought up how he wanted to specifically research the motif of 'War,' in Soviet rock songs. Misha then brought up topic modeling, and James said we'll talk about it more next time, and gave us a link to help out.
Things are really heating up in the Soviet Rock Songs project, stay tuned.
This week, we (Misha & Crissy) got cracking on marking up some song lyrics in XML!! Yay! Misha based his mark-up off of the schema that Jack made last week and focused on close reading elements (like Aesopian language and song outlook), while I focused more on collecting a lot of different the thematic tags and reference attributes we might be able to use. We came up with totally different markups, but this gave us a chance to talk about what exactly we are looking for in the songs, as well as in the project more generally. We talked about what kind of references we could auto-tag (alcohol references, nature references, etc.) versus what was subjective and needed to be tagged by hand (outlook, aesopian language, cultural references, etc.) We also talked about what other topics outside of just "political dissident-ness" we would like to look at. We reaffirmed that by putting together close-reading (those subject, hand-tag jobs) with distant reading (the autotagging that the computer can help us with), we will be able to produce some really interesting and impactful results. For example, an autotagged document that returns overwhelming amounts of content relating to militarism, paired with our close-reading that returns a negative outlook value would be very compelling for a song produced during the time of the Soviet war in Afghanistan.
We also decided we want to find ~5-10 overarching categories or themes to track in our corpus as elements, and then use attributes (where necessary, but not frivolously) to further refine the data we can collect on different themes, topics, and emotions in the song lyrics. In order to do this, I (Crissy and Misha) will be reconciling our XML documents and their different reference elements, and then using that to each come up with 10 major topics we'd like to study through this project, with the ultimate goal of each nixing 5 of them to leave us each with a Top 5, which will become the project's Top 10 subjects/themes.
We will also be compiling a corpus of words that fall under each of these different subjects and subtopics to help move us towards auto-tagging. This will also help us to keep refining our references elements and attributes. Misha will also be looking into creating a simple program to run on our corpus (which we need to finish compiling) in order to look at the frequency of words overall in our corpus before we even beginning the task of auto-tagging. This will also contribute to our task of developing the subjects and subcategories we want to tag by showing us words or themes that we might not have even noticed with our naked eye. In contributing to this big task, Jack will continue to develop our schema and update it as we zero in on our elements & attributes!
THINGS ARE GETTING EXTREMELY HOT IN THE SOVIET ROCK SONGS PROJECT, DEFINITELY STAY TUNED & ROCK ON!
Sounds like you guys have made some great progress! The Songs of Colonization project has some similarities to your project as we are also tracking themes, topics, and emotions in songs. Though we had a plan for what we specifically wanted to track, we ran into a few roadblocks when we started to markup our songs this past week. We still need to figure out how to most effectively name our elements and attributes and what it means and how to track emotions such as "romanticism". I would love to hear about the nuts and bolts of your markup so that we can compare.
As a side note, do you have a separate thread for project comments? I wasn't sure where to post this. If so, I would be happy to transfer this comment to that thread so that I don't clog your project updates.
Well, we have a schema! Jack made a schema for us that is pretty all-encompassing, so we can get started checking our XML files. Misha & Crissy also have marked up two songs and passed them back and forth to check on/further develop the tags we will be using, as well as the kind of structural markup we would like to use. One of the biggest questions that still remains for us is exactly what kind of a system/method we will be using to create our sample of songs, and how big that sample will be. We have played with a lot of different ideas, but nothing is set in stone yet. The next big steps for us will be to continue to markup songs (obviously, the question of what songs to do is getting more and more vital) and to begin to design our website.
Well, it is somewhat difficult to continue marking up songs in XML when you are not sure what songs/what kind of sample you will be using in specifics. In this vein, Misha found a cool book for us to use to create a simple methodology of which songs we are selecting, which is that they are songs from within the book. Exactly which songs we will pick is still up in the air. The question of narrowing the project down (in terms of what we will be tagging, etc.) is one we are currently trying to navigate as well.
This week was busy and exciting! Misha designed a very sleek and fashionable splash page for our project site! The style that he came up with for that page should be extendable to the rest of the website with some alterations. Crissy came up with XSLT that we should be able to use for each song page on the website. We spent a good chunk of our project meeting last week figuring out how to embed Youtube links into html using XSLT which was a bit of a buggier process than we anticipated! However, this was a good use of time because embedded YouTube videos of the songs we are analyzing will be the primary expected way that users will interact with the original song. We will also be including text of the actual song lyrics in Russian, but displaying them on the page will be an option each user with toggle. Now that we have sorted the issue of XSLT, the task of moving material from our XML markup into the website should be a lot easier and more exciting. Our biggest task is still to continue to do XML mark-up of the song lyrics. Aside from that, looking forward to next week, Misha is going to look into extending the website so we have clickable links for the stuff in the header, etc. Crissy is going to look into beginning to collect the data from our XML files, and think about how we will be best displaying that.
Project Report 1
Our group has settled on the Soviet Rock Songs project topic, but we have made some adjustments to the original project proposal. While we are certain that sourcing Russian rock songs from the Internet will be the way to go and should avoid any copyright issues, we won’t be able to use any Estonian or otherwise non-Russian songs in existing translation due to copyright issues. For any non-Russian song, we will have to look for the lyrics online in their original language, and then request to have them translated by some friends or colleagues who are familiar with the language. If any of our translations don’t work out, we will just be working with the lyrics of Russian-language rock songs from any Soviet republic. Outside of Russia, the other regions we are looking into include Uzbekistan, Belarus, Ukraine, and Mongolia (Soviet satellite state). Another addition to our project is that we would like to integrate GIS and digital mapping into our project because one of our group members is familiar with this software. It may be possible to track the movement of different themes or else merely express the popularity of certain themes over different geographic areas using digital mapping, especially considering that many of the republics we are hoping to look at are spread out widely across the USSR. Additionally, in the past week we have further familiarized ourselves with the command line and GitHub, set up a project board to assign and complete tasks, and set a goal to compile a baseline corpus of songs for next week.