ChrisNygaard / Eksamensgruppe19

0 stars 0 forks source link

Eksamensprojekt #1

Open ChrisNygaard opened 8 years ago

ChrisNygaard commented 8 years ago

title: "Eksamen"

output: html_document

The paragraph 20 questions asked in the Danish Parliament (Folketinget), are questions directed at a secretary of the government from a regular member of parliament. They are used to directly aquire information from the particular secretary, and are usually rather critical of the law they are directed at.

First we need all our packages

#####Libraries
library("readr")
library("plyr")
library("ggplot2")
library("dplyr")
library("rvest")
library("stringr")
library("lubridate")

First we need to acquire the data. This is done by scraping the website of the Danish Parliament. Many of the issues can be solved simply by scraping the main pages with the question headline, author, department directed at, date and so on. This is done by a scraper functio that we define as such.

scrape <- function(link){
  # First we load the website
  ft_data <- read_html(link, encoding="UTF-8")
  # Then we get the titles of the questions
  ft_titler <- as.matrix(ft_data %>% 
               html_nodes("h3 a") %>%
               html_text())

 ft_titler <- as.matrix(str_trim(ft_titler, "both"))
 ft_titler <- gsub("Endeligt svar", "", ft_titler)

  # Then we acquire the wanted information 
  ft_systematisk = matrix(ft_data %>% 
                  html_nodes("div p") %>%
                  html_text())
  ft_systematisk <- as.matrix(str_trim(ft_systematisk, "both"))
  #ft_systematisk <- gsub("Endeligt svar", "", ft_systematisk)
  #ft_systematisk <- 
  return(ft_systematisk)
}

ændring Then we need to us our function on the link of our choosing. The link will be set in a loop, to ensure we get all the data right away. Thus we start with 200 views per page at page 1

ft_link <- "http://www.ft.dk/Search.aspx?q=&SearchDepthFull=&Udvalg=&ParagrafTyveType=&StilletTil=&StilletAf=&PeriodenBesvaretFrom=&PeriodenBesvaretTo=&Status=&Emneord=&sf=ps&msf=ps&as=1&Samling=20111&PeriodenStilletFrom=2010-07-01&PeriodenStilletTo=2015-11-21&SortBy=SortDate&SortOrder=asc&pageSize=200&pageNr=1#search"

x1 <- scrape(ft_link)
print(x1)

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

Danielsloth commented 8 years ago

library("rvest") library("plyr") 'css.selector="td"' 'css.selector="td:nth-child(6) , .filtered" including date' css.selector=".filtered" link <-" http://www.ft.dk/dokumenter/vis_efter_type/spoergsmaal/spoergsmaal.aspx?questionSearchtype=1&startDate=20101006&endDate=20161001&session=&minister=-1&inquirer=-1&committee=-1&statusAnswer=-1&sortColumn=caseNumber&sortOrder=desc&startRecord=1&numberOfRecords=150000&totalNumberOfRecords=#dok " ft_data=read_html(link, encoding = "UTF-8") %>% html_nodes(css=css.selector) %>% html_text() ft_data df<-data.frame(ft_data)

fulddata <- matrix(ncol=1) for (i in 1:length(df[,1])){ link_spm <-"http://www.ft.dk/samling/20111/spoergsmaal/" link1 <- paste(paste(link_spm, df[i,1], sep = ""),"/index.htm", sep ="") link1<-gsub(" ", "", link1) fulddata <- rbind(fulddata, link1) 'Sys.sleep(1)' } fulddata<-as.data.frame(fulddata[-1,]) fulddata<-data.frame(lapply(fulddata, as.character), stringsAsFactors = FALSE) first.link = as.data.frame(fulddata[1,])

first.link.text=read_html(first.link, encoding = "UTF-8") %>% html_nodes(".tableTitle+ p , p a , h1") %>% html_text()

2015-12-06 10:32 GMT+01:00 ChrisNygaard notifications@github.com:


title: "Eksamen" output: html_document

The paragraph 20 questions asked in the Danish Parliament (Folketinget), are questions directed at a secretary of the government from a regular member of parliament. They are used to directly aquire information from the particular secretary, and are usually rather critical of the law they are directed at.

First we need all our packages

Libraries

library("readr") library("plyr") library("ggplot2") library("dplyr") library("rvest") library("stringr") library("lubridate")

First we need to acquire the data. This is done by scraping the website of the Danish Parliament. Many of the issues can be solved simply by scraping the main pages with the question headline, author, department directed at, date and so on. This is done by a scraper functio that we define as such.

scrape <- function(link){

First we load the website

ft_data <- read_html(link, encoding="UTF-8")

Then we get the titles of the questions

ft_titler <- as.matrix(ft_data %>% html_nodes("h3 a") %>% html_text())

ft_titler <- as.matrix(str_trim(ft_titler, "both")) ft_titler <- gsub("Endeligt svar", "", ft_titler)

Then we acquire the wanted information

ft_systematisk = matrix(ft_data %>% html_nodes("div p") %>% html_text()) ft_systematisk <- as.matrix(str_trim(ft_systematisk, "both"))

ft_systematisk <- gsub("Endeligt svar", "", ft_systematisk)

ft_systematisk <-

return(ft_systematisk) }

Then we need to us our function on the link of our choosing. The link will be set in a loop, to ensure we get all the data right away. Thus we start with 200 views per page at page 1

ft_link <- "http://www.ft.dk/Search.aspx?q=&SearchDepthFull=&Udvalg=&ParagrafTyveType=&StilletTil=&StilletAf=&PeriodenBesvaretFrom=&PeriodenBesvaretTo=&Status=&Emneord=&sf=ps&msf=ps&as=1&Samling=20111&PeriodenStilletFrom=2010-07-01&PeriodenStilletTo=2015-11-21&SortBy=SortDate&SortOrder=asc&pageSize=200&pageNr=1#search" x1 <- scrape(ft_link) print(x1)

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

— Reply to this email directly or view it on GitHub https://github.com/ChrisNygaard/Eksamensgruppe19/issues/1.

Danielsloth commented 8 years ago

library("rvest") library("plyr") 'css.selector="td"' 'css.selector="td:nth-child(6) , .filtered" including date' css.selector=".filtered" link <-" http://www.ft.dk/dokumenter/vis_efter_type/spoergsmaal/spoergsmaal.aspx?questionSearchtype=1&startDate=20101006&endDate=20161001&session=&minister=-1&inquirer=-1&committee=-1&statusAnswer=-1&sortColumn=caseNumber&sortOrder=desc&startRecord=1&numberOfRecords=150000&totalNumberOfRecords=#dok " ft_data=read_html(link, encoding = "UTF-8") %>% html_nodes(css=css.selector) %>% html_text() ft_data df<-data.frame(ft_data)

fulddata <- matrix(ncol=1) for (i in 1:length(df[,1])){ link_spm <-"http://www.ft.dk/samling/20111/spoergsmaal/" link1 <- paste(paste(link_spm, df[i,1], sep = ""),"/index.htm", sep ="") link1<-gsub(" ", "", link1) fulddata <- rbind(fulddata, link1) 'Sys.sleep(1)' } fulddata<-as.data.frame(fulddata[-1,]) fulddata<-data.frame(lapply(fulddata, as.character), stringsAsFactors = FALSE) first.link <- fulddata[1,]

link.text1<-matrix(data="NA", ncol=7, nrow=length(fulddata[,1])) for (i in 1:length(fulddata[,1])){ first.link <- fulddata[i,] link.text=as.matrix(read_html(first.link, encoding = "UTF-8") %>% html_nodes(".tableTitle+ p , p a , h1") %>% html_text()) link.text1[i,] <- link.text }

2015-12-07 8:32 GMT+01:00 Daniel Sloth Olesen danielslotholesen@gmail.com:

library("rvest") library("plyr") 'css.selector="td"' 'css.selector="td:nth-child(6) , .filtered" including date' css.selector=".filtered" link <-" http://www.ft.dk/dokumenter/vis_efter_type/spoergsmaal/spoergsmaal.aspx?questionSearchtype=1&startDate=20101006&endDate=20161001&session=&minister=-1&inquirer=-1&committee=-1&statusAnswer=-1&sortColumn=caseNumber&sortOrder=desc&startRecord=1&numberOfRecords=150000&totalNumberOfRecords=#dok " ft_data=read_html(link, encoding = "UTF-8") %>% html_nodes(css=css.selector) %>% html_text() ft_data df<-data.frame(ft_data)

fulddata <- matrix(ncol=1) for (i in 1:length(df[,1])){ link_spm <-"http://www.ft.dk/samling/20111/spoergsmaal/" link1 <- paste(paste(link_spm, df[i,1], sep = ""),"/index.htm", sep ="") link1<-gsub(" ", "", link1) fulddata <- rbind(fulddata, link1) 'Sys.sleep(1)' } fulddata<-as.data.frame(fulddata[-1,]) fulddata<-data.frame(lapply(fulddata, as.character), stringsAsFactors = FALSE) first.link = as.data.frame(fulddata[1,])

first.link.text=read_html(first.link, encoding = "UTF-8") %>% html_nodes(".tableTitle+ p , p a , h1") %>% html_text()

2015-12-06 10:32 GMT+01:00 ChrisNygaard notifications@github.com:


title: "Eksamen" output: html_document

The paragraph 20 questions asked in the Danish Parliament (Folketinget), are questions directed at a secretary of the government from a regular member of parliament. They are used to directly aquire information from the particular secretary, and are usually rather critical of the law they are directed at.

First we need all our packages

Libraries

library("readr") library("plyr") library("ggplot2") library("dplyr") library("rvest") library("stringr") library("lubridate")

First we need to acquire the data. This is done by scraping the website of the Danish Parliament. Many of the issues can be solved simply by scraping the main pages with the question headline, author, department directed at, date and so on. This is done by a scraper functio that we define as such.

scrape <- function(link){

First we load the website

ft_data <- read_html(link, encoding="UTF-8")

Then we get the titles of the questions

ft_titler <- as.matrix(ft_data %>% html_nodes("h3 a") %>% html_text())

ft_titler <- as.matrix(str_trim(ft_titler, "both")) ft_titler <- gsub("Endeligt svar", "", ft_titler)

Then we acquire the wanted information

ft_systematisk = matrix(ft_data %>% html_nodes("div p") %>% html_text()) ft_systematisk <- as.matrix(str_trim(ft_systematisk, "both"))

ft_systematisk <- gsub("Endeligt svar", "", ft_systematisk)

ft_systematisk <-

return(ft_systematisk) }

Then we need to us our function on the link of our choosing. The link will be set in a loop, to ensure we get all the data right away. Thus we start with 200 views per page at page 1

ft_link <- "http://www.ft.dk/Search.aspx?q=&SearchDepthFull=&Udvalg=&ParagrafTyveType=&StilletTil=&StilletAf=&PeriodenBesvaretFrom=&PeriodenBesvaretTo=&Status=&Emneord=&sf=ps&msf=ps&as=1&Samling=20111&PeriodenStilletFrom=2010-07-01&PeriodenStilletTo=2015-11-21&SortBy=SortDate&SortOrder=asc&pageSize=200&pageNr=1#search" x1 <- scrape(ft_link) print(x1)

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

— Reply to this email directly or view it on GitHub https://github.com/ChrisNygaard/Eksamensgruppe19/issues/1.

ChrisNygaard commented 8 years ago

title: "Eksamen2"

output: html_document

library("rvest")
library("plyr")
library("lubridate")

# Here we get the links
 'css.selector="td"'
 css.selector="td:nth-child(6) , .filtered" #including date
 #css.selector=".filtered"
 link <-"
http://www.ft.dk/dokumenter/vis_efter_type/spoergsmaal/spoergsmaal.aspx?questionSearchtype=1&startDate=20101006&endDate=20161001&session=&minister=-1&inquirer=-1&committee=-1&statusAnswer=-1&sortColumn=caseNumber&sortOrder=desc&startRecord=1&numberOfRecords=15000&totalNumberOfRecords=#dok
"
 ft_data=read_html(link, encoding = "UTF-8") %>%
 html_nodes(css=css.selector) %>%
 html_text()
 ft_data
 # Vi skal bruge både serienummer og dato, og derfor smides numrene i
kolonne 1 og datoer i 2, og så
 # extracter vi året
 df[,1] <- ft_data[seq(1, length(ft_data), 2)]
 df[,2] <- ft_data[seq(2, length(ft_data), 2)]
 df[,3] <- as.Date(gsub( " .*$", "", df[,2] ))#, format="%d-%m-%y")

 fulddata <- matrix(ncol=1)
 for (i in 1:length(df[,1])){
 link_spm <-"http://www.ft.dk/samling/20111/spoergsmaal/"
 link1 <- paste(paste(link_spm, df[i,1], sep = ""),"/index.htm", sep ="")
 link1<-gsub(" ", "", link1)
 fulddata <- rbind(fulddata, link1)
 'Sys.sleep(1)'
 }
 fulddata<-as.data.frame(fulddata[-1,])
 fulddata<-data.frame(lapply(fulddata, as.character), stringsAsFactors =
 FALSE)
 first.link <- fulddata[1,]

 link.text1<-matrix(data="NA", ncol=7, nrow=length(fulddata[,1]))
 for (i in 1:length(fulddata[,1])){
 first.link <- fulddata[i,]
 link.text=as.matrix(read_html(first.link, encoding = "UTF-8") %>%
 html_nodes(".tableTitle+ p , p a , h1") %>%
 html_text())
 link.text1[i,] <- link.text

2015-12-07 9:14 GMT+01:00 Danielsloth notifications@github.com:

library("rvest") library("plyr") 'css.selector="td"' 'css.selector="td:nth-child(6) , .filtered" including date' css.selector=".filtered" link <-"

http://www.ft.dk/dokumenter/vis_efter_type/spoergsmaal/spoergsmaal.aspx?questionSearchtype=1&startDate=20101006&endDate=20161001&session=&minister=-1&inquirer=-1&committee=-1&statusAnswer=-1&sortColumn=caseNumber&sortOrder=desc&startRecord=1&numberOfRecords=150000&totalNumberOfRecords=#dok " ft_data=read_html(link, encoding = "UTF-8") %>% html_nodes(css=css.selector) %>% html_text() ft_data df<-data.frame(ft_data)

fulddata <- matrix(ncol=1) for (i in 1:length(df[,1])){ link_spm <-"http://www.ft.dk/samling/20111/spoergsmaal/" link1 <- paste(paste(link_spm, df[i,1], sep = ""),"/index.htm", sep ="") link1<-gsub(" ", "", link1) fulddata <- rbind(fulddata, link1) 'Sys.sleep(1)' } fulddata<-as.data.frame(fulddata[-1,]) fulddata<-data.frame(lapply(fulddata, as.character), stringsAsFactors = FALSE) first.link <- fulddata[1,]

link.text1<-matrix(data="NA", ncol=7, nrow=length(fulddata[,1])) for (i in 1:length(fulddata[,1])){ first.link <- fulddata[i,] link.text=as.matrix(read_html(first.link, encoding = "UTF-8") %>% html_nodes(".tableTitle+ p , p a , h1") %>% html_text()) link.text1[i,] <- link.text

}

2015-12-07 8:32 GMT+01:00 Daniel Sloth Olesen <danielslotholesen@gmail.com

:

library("rvest") library("plyr") 'css.selector="td"' 'css.selector="td:nth-child(6) , .filtered" including date' css.selector=".filtered" link <-"

http://www.ft.dk/dokumenter/vis_efter_type/spoergsmaal/spoergsmaal.aspx?questionSearchtype=1&startDate=20101006&endDate=20161001&session=&minister=-1&inquirer=-1&committee=-1&statusAnswer=-1&sortColumn=caseNumber&sortOrder=desc&startRecord=1&numberOfRecords=150000&totalNumberOfRecords=#dok " ft_data=read_html(link, encoding = "UTF-8") %>% html_nodes(css=css.selector) %>% html_text() ft_data df<-data.frame(ft_data)

fulddata <- matrix(ncol=1) for (i in 1:length(df[,1])){ link_spm <-"http://www.ft.dk/samling/20111/spoergsmaal/" link1 <- paste(paste(link_spm, df[i,1], sep = ""),"/index.htm", sep ="") link1<-gsub(" ", "", link1) fulddata <- rbind(fulddata, link1) 'Sys.sleep(1)' } fulddata<-as.data.frame(fulddata[-1,]) fulddata<-data.frame(lapply(fulddata, as.character), stringsAsFactors = FALSE) first.link = as.data.frame(fulddata[1,])

first.link.text=read_html(first.link, encoding = "UTF-8") %>% html_nodes(".tableTitle+ p , p a , h1") %>% html_text()

2015-12-06 10:32 GMT+01:00 ChrisNygaard notifications@github.com:


title: "Eksamen" output: html_document

The paragraph 20 questions asked in the Danish Parliament (Folketinget), are questions directed at a secretary of the government from a regular member of parliament. They are used to directly aquire information from the particular secretary, and are usually rather critical of the law they are directed at.

First we need all our packages

Libraries

library("readr") library("plyr") library("ggplot2") library("dplyr") library("rvest") library("stringr") library("lubridate")

First we need to acquire the data. This is done by scraping the website of the Danish Parliament. Many of the issues can be solved simply by scraping the main pages with the question headline, author, department directed at, date and so on. This is done by a scraper functio that we define as such.

scrape <- function(link){

First we load the website

ft_data <- read_html(link, encoding="UTF-8")

Then we get the titles of the questions

ft_titler <- as.matrix(ft_data %>% html_nodes("h3 a") %>% html_text())

ft_titler <- as.matrix(str_trim(ft_titler, "both")) ft_titler <- gsub("Endeligt svar", "", ft_titler)

Then we acquire the wanted information

ft_systematisk = matrix(ft_data %>% html_nodes("div p") %>% html_text()) ft_systematisk <- as.matrix(str_trim(ft_systematisk, "both"))

ft_systematisk <- gsub("Endeligt svar", "", ft_systematisk)

ft_systematisk <-

return(ft_systematisk) }

Then we need to us our function on the link of our choosing. The link will be set in a loop, to ensure we get all the data right away. Thus we start with 200 views per page at page 1

ft_link <- " http://www.ft.dk/Search.aspx?q=&SearchDepthFull=&Udvalg=&ParagrafTyveType=&StilletTil=&StilletAf=&PeriodenBesvaretFrom=&PeriodenBesvaretTo=&Status=&Emneord=&sf=ps&msf=ps&as=1&Samling=20111&PeriodenStilletFrom=2010-07-01&PeriodenStilletTo=2015-11-21&SortBy=SortDate&SortOrder=asc&pageSize=200&pageNr=1#search " x1 <- scrape(ft_link) print(x1)

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

— Reply to this email directly or view it on GitHub https://github.com/ChrisNygaard/Eksamensgruppe19/issues/1.

— Reply to this email directly or view it on GitHub https://github.com/ChrisNygaard/Eksamensgruppe19/issues/1#issuecomment-162445036 .

ChrisNygaard commented 8 years ago

title: "Eksamen2"

output: html_document

library("rvest")
library("plyr")
library("lubridate")

# Here we get the links
 'css.selector="td"'
 css.selector="td:nth-child(6) , .filtered" #including date
 #css.selector=".filtered"
 link <-"
http://www.ft.dk/dokumenter/vis_efter_type/spoergsmaal/spoergsmaal.aspx?questionSearchtype=1&startDate=20101006&endDate=20161001&session=&minister=-1&inquirer=-1&committee=-1&statusAnswer=-1&sortColumn=caseNumber&sortOrder=desc&startRecord=1&numberOfRecords=15000&totalNumberOfRecords=#dok
"
 ft_data=read_html(link, encoding = "UTF-8") %>%
 html_nodes(css=css.selector) %>%
 html_text()
 ft_data
 # Vi skal bruge både serienummer og dato, og derfor smides numrene i
kolonne 1 og datoer i 2, og så
 # extracter vi året
 df[,1] <- ft_data[seq(1, length(ft_data), 2)]
 df[,2] <- ft_data[seq(2, length(ft_data), 2)]
 df[,2] <- gsub( " .*:", "",ft_data[seq(2, length(ft_data), 2)]))
 #df[,2] <- as.Date(as.character(gsub( " .*$", "",ft_data[seq(2,
length(ft_data), 2)])), format="%d-%m-%Y")
 df[,3] <- year( df[,2] )

 fulddata <- matrix(ncol=1)

 for (i in 1:length(df[,1])){
 link_spm <-"http://www.ft.dk/samling/20111/spoergsmaal/"
 link_p1 <-"http://www.ft.dk/samling/"
 link1 <-paste(paste(link_p1, df[i,3], sep = ""),"1/spoergsmaal/", sep ="")
 link1 <- paste(paste(link1, df[i,1], sep = ""),"/index.htm", sep ="")
 link1<-gsub(" ", "", link1)
 fulddata <- rbind(fulddata, link1)
 'Sys.sleep(1)'
 }
 fulddata<-as.data.frame(fulddata[-1,])
 fulddata<-data.frame(lapply(fulddata, as.character), stringsAsFactors =
 FALSE)
 first.link <- fulddata[1,]

 link.text1<-matrix(data="NA", ncol=7, nrow=length(fulddata[,1]))
 for (i in 1:length(fulddata[,1])){
 first.link <- fulddata[i,]
 link.text=as.matrix(read_html(first.link, encoding = "UTF-8") %>%
 html_nodes(".tableTitle+ p , p a , h1") %>%
 html_text())
 link.text1[i,] <- link.text

2015-12-07 10:04 GMT+01:00 Christoffer Nygaard cbnygaard@gmail.com:


title: "Eksamen2"

output: html_document

library("rvest")
library("plyr")
library("lubridate")

# Here we get the links
 'css.selector="td"'
 css.selector="td:nth-child(6) , .filtered" #including date
 #css.selector=".filtered"
 link <-"
http://www.ft.dk/dokumenter/vis_efter_type/spoergsmaal/spoergsmaal.aspx?questionSearchtype=1&startDate=20101006&endDate=20161001&session=&minister=-1&inquirer=-1&committee=-1&statusAnswer=-1&sortColumn=caseNumber&sortOrder=desc&startRecord=1&numberOfRecords=15000&totalNumberOfRecords=#dok
"
 ft_data=read_html(link, encoding = "UTF-8") %>%
 html_nodes(css=css.selector) %>%
 html_text()
 ft_data
 # Vi skal bruge både serienummer og dato, og derfor smides numrene i
kolonne 1 og datoer i 2, og så
 # extracter vi året
 df[,1] <- ft_data[seq(1, length(ft_data), 2)]
 df[,2] <- ft_data[seq(2, length(ft_data), 2)]
 df[,3] <- as.Date(gsub( " .*$", "", df[,2] ))#, format="%d-%m-%y")

 fulddata <- matrix(ncol=1)
 for (i in 1:length(df[,1])){
 link_spm <-"http://www.ft.dk/samling/20111/spoergsmaal/"
 link1 <- paste(paste(link_spm, df[i,1], sep = ""),"/index.htm", sep ="")
 link1<-gsub(" ", "", link1)
 fulddata <- rbind(fulddata, link1)
 'Sys.sleep(1)'
 }
 fulddata<-as.data.frame(fulddata[-1,])
 fulddata<-data.frame(lapply(fulddata, as.character), stringsAsFactors =
 FALSE)
 first.link <- fulddata[1,]

 link.text1<-matrix(data="NA", ncol=7, nrow=length(fulddata[,1]))
 for (i in 1:length(fulddata[,1])){
 first.link <- fulddata[i,]
 link.text=as.matrix(read_html(first.link, encoding = "UTF-8") %>%
 html_nodes(".tableTitle+ p , p a , h1") %>%
 html_text())
 link.text1[i,] <- link.text

2015-12-07 9:14 GMT+01:00 Danielsloth notifications@github.com:

library("rvest") library("plyr") 'css.selector="td"' 'css.selector="td:nth-child(6) , .filtered" including date' css.selector=".filtered" link <-"

http://www.ft.dk/dokumenter/vis_efter_type/spoergsmaal/spoergsmaal.aspx?questionSearchtype=1&startDate=20101006&endDate=20161001&session=&minister=-1&inquirer=-1&committee=-1&statusAnswer=-1&sortColumn=caseNumber&sortOrder=desc&startRecord=1&numberOfRecords=150000&totalNumberOfRecords=#dok " ft_data=read_html(link, encoding = "UTF-8") %>% html_nodes(css=css.selector) %>% html_text() ft_data df<-data.frame(ft_data)

fulddata <- matrix(ncol=1) for (i in 1:length(df[,1])){ link_spm <-"http://www.ft.dk/samling/20111/spoergsmaal/" link1 <- paste(paste(link_spm, df[i,1], sep = ""),"/index.htm", sep ="") link1<-gsub(" ", "", link1) fulddata <- rbind(fulddata, link1) 'Sys.sleep(1)' } fulddata<-as.data.frame(fulddata[-1,]) fulddata<-data.frame(lapply(fulddata, as.character), stringsAsFactors = FALSE) first.link <- fulddata[1,]

link.text1<-matrix(data="NA", ncol=7, nrow=length(fulddata[,1])) for (i in 1:length(fulddata[,1])){ first.link <- fulddata[i,] link.text=as.matrix(read_html(first.link, encoding = "UTF-8") %>% html_nodes(".tableTitle+ p , p a , h1") %>% html_text()) link.text1[i,] <- link.text

}

2015-12-07 8:32 GMT+01:00 Daniel Sloth Olesen < danielslotholesen@gmail.com>:

library("rvest") library("plyr") 'css.selector="td"' 'css.selector="td:nth-child(6) , .filtered" including date' css.selector=".filtered" link <-"

http://www.ft.dk/dokumenter/vis_efter_type/spoergsmaal/spoergsmaal.aspx?questionSearchtype=1&startDate=20101006&endDate=20161001&session=&minister=-1&inquirer=-1&committee=-1&statusAnswer=-1&sortColumn=caseNumber&sortOrder=desc&startRecord=1&numberOfRecords=150000&totalNumberOfRecords=#dok " ft_data=read_html(link, encoding = "UTF-8") %>% html_nodes(css=css.selector) %>% html_text() ft_data df<-data.frame(ft_data)

fulddata <- matrix(ncol=1) for (i in 1:length(df[,1])){ link_spm <-"http://www.ft.dk/samling/20111/spoergsmaal/" link1 <- paste(paste(link_spm, df[i,1], sep = ""),"/index.htm", sep ="") link1<-gsub(" ", "", link1) fulddata <- rbind(fulddata, link1) 'Sys.sleep(1)' } fulddata<-as.data.frame(fulddata[-1,]) fulddata<-data.frame(lapply(fulddata, as.character), stringsAsFactors = FALSE) first.link = as.data.frame(fulddata[1,])

first.link.text=read_html(first.link, encoding = "UTF-8") %>% html_nodes(".tableTitle+ p , p a , h1") %>% html_text()

2015-12-06 10:32 GMT+01:00 ChrisNygaard notifications@github.com:


title: "Eksamen" output: html_document

The paragraph 20 questions asked in the Danish Parliament (Folketinget), are questions directed at a secretary of the government from a regular member of parliament. They are used to directly aquire information from the particular secretary, and are usually rather critical of the law they are directed at.

First we need all our packages

Libraries

library("readr") library("plyr") library("ggplot2") library("dplyr") library("rvest") library("stringr") library("lubridate")

First we need to acquire the data. This is done by scraping the website of the Danish Parliament. Many of the issues can be solved simply by scraping the main pages with the question headline, author, department directed at, date and so on. This is done by a scraper functio that we define as such.

scrape <- function(link){

First we load the website

ft_data <- read_html(link, encoding="UTF-8")

Then we get the titles of the questions

ft_titler <- as.matrix(ft_data %>% html_nodes("h3 a") %>% html_text())

ft_titler <- as.matrix(str_trim(ft_titler, "both")) ft_titler <- gsub("Endeligt svar", "", ft_titler)

Then we acquire the wanted information

ft_systematisk = matrix(ft_data %>% html_nodes("div p") %>% html_text()) ft_systematisk <- as.matrix(str_trim(ft_systematisk, "both"))

ft_systematisk <- gsub("Endeligt svar", "", ft_systematisk)

ft_systematisk <-

return(ft_systematisk) }

Then we need to us our function on the link of our choosing. The link will be set in a loop, to ensure we get all the data right away. Thus we start with 200 views per page at page 1

ft_link <- " http://www.ft.dk/Search.aspx?q=&SearchDepthFull=&Udvalg=&ParagrafTyveType=&StilletTil=&StilletAf=&PeriodenBesvaretFrom=&PeriodenBesvaretTo=&Status=&Emneord=&sf=ps&msf=ps&as=1&Samling=20111&PeriodenStilletFrom=2010-07-01&PeriodenStilletTo=2015-11-21&SortBy=SortDate&SortOrder=asc&pageSize=200&pageNr=1#search " x1 <- scrape(ft_link) print(x1)

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

— Reply to this email directly or view it on GitHub https://github.com/ChrisNygaard/Eksamensgruppe19/issues/1.

— Reply to this email directly or view it on GitHub https://github.com/ChrisNygaard/Eksamensgruppe19/issues/1#issuecomment-162445036 .