UD3-Lab / mintEMU

mintEMU - The Legacy of the European Postgraduate Master in Urbanism at TU Delft: A Text Mining Approach
Other
1 stars 0 forks source link

Text conversion not working after updating file list #15

Closed cforgaci closed 1 year ago

cforgaci commented 1 year ago

@alwil, I updated the setup code in the paper and now I am getting an error when I try to extract the text with the convert_pdf_text() function.

This is the code chunk:

#| label: setup

# Load packages ----
## Package for managing paths
library(here)

## Metapackage for data science
library(tidyverse)

## Packages for text processing and analysis
library(tidytext)
library(SnowballC)
library(tm)

# Load analysis functions ----
devtools::load_all(".")

# Read thesis metadata ----
data_path <- here("analysis", "data", "raw_data")
pdf_names <- dir(path, pattern = "*.pdf")

emu_theses <- 
  read_csv(here(data_path, "theses-metadata.csv")) %>%
  mutate(text = "") %>%
  filter(!is.na(pdf_via))

pdf_paths <- here(data_path, emu_theses$file_name)

And this is the error I get:

Error: unexpected symbol in "emu_theses$text <- convert_pdf_text(pdf_paths)."

I think this is an encoding issue due to special characters used in the pdf file names, and I tried to use base R function iconv() to change the encoding of pdf_path, but I get the same error. Any ideas? `

alwil commented 1 year ago

Hi @cforgaci ,

It seems there's a superfluous dot (.) at the and of the call emu_theses$text <- convert_pdf_text(pdf_paths). . Please let me know if it solves the issue.

cforgaci commented 1 year ago

Wow, it did. 🤦‍♂️ Thanks! Closing this issue.