Sicheng2000 / lab-05

Lab 5: Harvesting research data
Creative Commons Attribution 4.0 International
0 stars 0 forks source link

Lab 05 feedback #2

Open Sicheng2000 opened 7 months ago

Sicheng2000 commented 7 months ago
  1. What did you learn? Data selection should align with research questions and accessibility considerations.
  1. What did you find most/ least challenging? I think is applying the untar and download.file functions because sometimes I forget the order of the variables and what they should contain. In such cases, I find it useful to prefix the function name with "?" to access its documentation and understand its usage. Another aspect that confuses me is what to do if I lack a CSV file; for instance, if I only have a PDF or HTML file. Currently, I'm working with the Switchboard Dialog Act Corpus, which solely provides UTT and HTML files.

  2. What resources did you consult? I referred to Recipe 5 (https://qtalr.github.io/qtalrkit/articles/recipe-5.html) for guidance, but it primarily utilizes the Gutenberg package, which already provides data frames. Thus, I'm curious about converting a text file into a CSV table. Through online research and assisting with ChatGPT, it appears that the text needs to be structured with categories such as tokens and IDs to facilitate conversion into a table format. It seems establishing a corpus structure is necessary before proceeding with the conversion. image(from https://www.youtube.com/watch?v=BPjgwdqHM8g)

  3. What more would you like to know about acquiring data? I want to know more about how to handle the UTT file. When I attempted to read it, the output appeared like this:

    image

    Since I'm only displaying the first 5 lines, I'm interested in how to display a specific line, such as the one containing words like "yeah". I've discovered methods that don't require converting text into CSV format. But all of them seem a little complicated. https://www.reddit.com/r/gamemaker/comments/r8qrz8/searching_for_a_specific_line_in_text_file/