Open ChrisNygaard opened 8 years ago
library("rvest") library("plyr") 'css.selector="td"' 'css.selector="td:nth-child(6) , .filtered" including date' css.selector=".filtered" link <-" http://www.ft.dk/dokumenter/vis_efter_type/spoergsmaal/spoergsmaal.aspx?questionSearchtype=1&startDate=20101006&endDate=20161001&session=&minister=-1&inquirer=-1&committee=-1&statusAnswer=-1&sortColumn=caseNumber&sortOrder=desc&startRecord=1&numberOfRecords=150000&totalNumberOfRecords=#dok " ft_data=read_html(link, encoding = "UTF-8") %>% html_nodes(css=css.selector) %>% html_text() ft_data df<-data.frame(ft_data)
fulddata <- matrix(ncol=1) for (i in 1:length(df[,1])){ link_spm <-"http://www.ft.dk/samling/20111/spoergsmaal/" link1 <- paste(paste(link_spm, df[i,1], sep = ""),"/index.htm", sep ="") link1<-gsub(" ", "", link1) fulddata <- rbind(fulddata, link1) 'Sys.sleep(1)' } fulddata<-as.data.frame(fulddata[-1,]) fulddata<-data.frame(lapply(fulddata, as.character), stringsAsFactors = FALSE) first.link = as.data.frame(fulddata[1,])
first.link.text=read_html(first.link, encoding = "UTF-8") %>% html_nodes(".tableTitle+ p , p a , h1") %>% html_text()
2015-12-06 10:32 GMT+01:00 ChrisNygaard notifications@github.com:
title: "Eksamen" output: html_document
The paragraph 20 questions asked in the Danish Parliament (Folketinget), are questions directed at a secretary of the government from a regular member of parliament. They are used to directly aquire information from the particular secretary, and are usually rather critical of the law they are directed at.
First we need all our packages
Libraries
library("readr") library("plyr") library("ggplot2") library("dplyr") library("rvest") library("stringr") library("lubridate")
First we need to acquire the data. This is done by scraping the website of the Danish Parliament. Many of the issues can be solved simply by scraping the main pages with the question headline, author, department directed at, date and so on. This is done by a scraper functio that we define as such.
scrape <- function(link){
First we load the website
ft_data <- read_html(link, encoding="UTF-8")
Then we get the titles of the questions
ft_titler <- as.matrix(ft_data %>% html_nodes("h3 a") %>% html_text())
ft_titler <- as.matrix(str_trim(ft_titler, "both")) ft_titler <- gsub("Endeligt svar", "", ft_titler)
Then we acquire the wanted information
ft_systematisk = matrix(ft_data %>% html_nodes("div p") %>% html_text()) ft_systematisk <- as.matrix(str_trim(ft_systematisk, "both"))
ft_systematisk <- gsub("Endeligt svar", "", ft_systematisk)
ft_systematisk <-
return(ft_systematisk) }
Then we need to us our function on the link of our choosing. The link will be set in a loop, to ensure we get all the data right away. Thus we start with 200 views per page at page 1
ft_link <- "http://www.ft.dk/Search.aspx?q=&SearchDepthFull=&Udvalg=&ParagrafTyveType=&StilletTil=&StilletAf=&PeriodenBesvaretFrom=&PeriodenBesvaretTo=&Status=&Emneord=&sf=ps&msf=ps&as=1&Samling=20111&PeriodenStilletFrom=2010-07-01&PeriodenStilletTo=2015-11-21&SortBy=SortDate&SortOrder=asc&pageSize=200&pageNr=1#search" x1 <- scrape(ft_link) print(x1)
Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.
— Reply to this email directly or view it on GitHub https://github.com/ChrisNygaard/Eksamensgruppe19/issues/1.
library("rvest") library("plyr") 'css.selector="td"' 'css.selector="td:nth-child(6) , .filtered" including date' css.selector=".filtered" link <-" http://www.ft.dk/dokumenter/vis_efter_type/spoergsmaal/spoergsmaal.aspx?questionSearchtype=1&startDate=20101006&endDate=20161001&session=&minister=-1&inquirer=-1&committee=-1&statusAnswer=-1&sortColumn=caseNumber&sortOrder=desc&startRecord=1&numberOfRecords=150000&totalNumberOfRecords=#dok " ft_data=read_html(link, encoding = "UTF-8") %>% html_nodes(css=css.selector) %>% html_text() ft_data df<-data.frame(ft_data)
fulddata <- matrix(ncol=1) for (i in 1:length(df[,1])){ link_spm <-"http://www.ft.dk/samling/20111/spoergsmaal/" link1 <- paste(paste(link_spm, df[i,1], sep = ""),"/index.htm", sep ="") link1<-gsub(" ", "", link1) fulddata <- rbind(fulddata, link1) 'Sys.sleep(1)' } fulddata<-as.data.frame(fulddata[-1,]) fulddata<-data.frame(lapply(fulddata, as.character), stringsAsFactors = FALSE) first.link <- fulddata[1,]
link.text1<-matrix(data="NA", ncol=7, nrow=length(fulddata[,1])) for (i in 1:length(fulddata[,1])){ first.link <- fulddata[i,] link.text=as.matrix(read_html(first.link, encoding = "UTF-8") %>% html_nodes(".tableTitle+ p , p a , h1") %>% html_text()) link.text1[i,] <- link.text }
2015-12-07 8:32 GMT+01:00 Daniel Sloth Olesen danielslotholesen@gmail.com:
library("rvest") library("plyr") 'css.selector="td"' 'css.selector="td:nth-child(6) , .filtered" including date' css.selector=".filtered" link <-" http://www.ft.dk/dokumenter/vis_efter_type/spoergsmaal/spoergsmaal.aspx?questionSearchtype=1&startDate=20101006&endDate=20161001&session=&minister=-1&inquirer=-1&committee=-1&statusAnswer=-1&sortColumn=caseNumber&sortOrder=desc&startRecord=1&numberOfRecords=150000&totalNumberOfRecords=#dok " ft_data=read_html(link, encoding = "UTF-8") %>% html_nodes(css=css.selector) %>% html_text() ft_data df<-data.frame(ft_data)
fulddata <- matrix(ncol=1) for (i in 1:length(df[,1])){ link_spm <-"http://www.ft.dk/samling/20111/spoergsmaal/" link1 <- paste(paste(link_spm, df[i,1], sep = ""),"/index.htm", sep ="") link1<-gsub(" ", "", link1) fulddata <- rbind(fulddata, link1) 'Sys.sleep(1)' } fulddata<-as.data.frame(fulddata[-1,]) fulddata<-data.frame(lapply(fulddata, as.character), stringsAsFactors = FALSE) first.link = as.data.frame(fulddata[1,])
first.link.text=read_html(first.link, encoding = "UTF-8") %>% html_nodes(".tableTitle+ p , p a , h1") %>% html_text()
2015-12-06 10:32 GMT+01:00 ChrisNygaard notifications@github.com:
title: "Eksamen" output: html_document
The paragraph 20 questions asked in the Danish Parliament (Folketinget), are questions directed at a secretary of the government from a regular member of parliament. They are used to directly aquire information from the particular secretary, and are usually rather critical of the law they are directed at.
First we need all our packages
Libraries
library("readr") library("plyr") library("ggplot2") library("dplyr") library("rvest") library("stringr") library("lubridate")
First we need to acquire the data. This is done by scraping the website of the Danish Parliament. Many of the issues can be solved simply by scraping the main pages with the question headline, author, department directed at, date and so on. This is done by a scraper functio that we define as such.
scrape <- function(link){
First we load the website
ft_data <- read_html(link, encoding="UTF-8")
Then we get the titles of the questions
ft_titler <- as.matrix(ft_data %>% html_nodes("h3 a") %>% html_text())
ft_titler <- as.matrix(str_trim(ft_titler, "both")) ft_titler <- gsub("Endeligt svar", "", ft_titler)
Then we acquire the wanted information
ft_systematisk = matrix(ft_data %>% html_nodes("div p") %>% html_text()) ft_systematisk <- as.matrix(str_trim(ft_systematisk, "both"))
ft_systematisk <- gsub("Endeligt svar", "", ft_systematisk)
ft_systematisk <-
return(ft_systematisk) }
Then we need to us our function on the link of our choosing. The link will be set in a loop, to ensure we get all the data right away. Thus we start with 200 views per page at page 1
ft_link <- "http://www.ft.dk/Search.aspx?q=&SearchDepthFull=&Udvalg=&ParagrafTyveType=&StilletTil=&StilletAf=&PeriodenBesvaretFrom=&PeriodenBesvaretTo=&Status=&Emneord=&sf=ps&msf=ps&as=1&Samling=20111&PeriodenStilletFrom=2010-07-01&PeriodenStilletTo=2015-11-21&SortBy=SortDate&SortOrder=asc&pageSize=200&pageNr=1#search" x1 <- scrape(ft_link) print(x1)
Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.
— Reply to this email directly or view it on GitHub https://github.com/ChrisNygaard/Eksamensgruppe19/issues/1.
title: "Eksamen2"
library("rvest")
library("plyr")
library("lubridate")
# Here we get the links
'css.selector="td"'
css.selector="td:nth-child(6) , .filtered" #including date
#css.selector=".filtered"
link <-"
http://www.ft.dk/dokumenter/vis_efter_type/spoergsmaal/spoergsmaal.aspx?questionSearchtype=1&startDate=20101006&endDate=20161001&session=&minister=-1&inquirer=-1&committee=-1&statusAnswer=-1&sortColumn=caseNumber&sortOrder=desc&startRecord=1&numberOfRecords=15000&totalNumberOfRecords=#dok
"
ft_data=read_html(link, encoding = "UTF-8") %>%
html_nodes(css=css.selector) %>%
html_text()
ft_data
# Vi skal bruge både serienummer og dato, og derfor smides numrene i
kolonne 1 og datoer i 2, og så
# extracter vi året
df[,1] <- ft_data[seq(1, length(ft_data), 2)]
df[,2] <- ft_data[seq(2, length(ft_data), 2)]
df[,3] <- as.Date(gsub( " .*$", "", df[,2] ))#, format="%d-%m-%y")
fulddata <- matrix(ncol=1)
for (i in 1:length(df[,1])){
link_spm <-"http://www.ft.dk/samling/20111/spoergsmaal/"
link1 <- paste(paste(link_spm, df[i,1], sep = ""),"/index.htm", sep ="")
link1<-gsub(" ", "", link1)
fulddata <- rbind(fulddata, link1)
'Sys.sleep(1)'
}
fulddata<-as.data.frame(fulddata[-1,])
fulddata<-data.frame(lapply(fulddata, as.character), stringsAsFactors =
FALSE)
first.link <- fulddata[1,]
link.text1<-matrix(data="NA", ncol=7, nrow=length(fulddata[,1]))
for (i in 1:length(fulddata[,1])){
first.link <- fulddata[i,]
link.text=as.matrix(read_html(first.link, encoding = "UTF-8") %>%
html_nodes(".tableTitle+ p , p a , h1") %>%
html_text())
link.text1[i,] <- link.text
2015-12-07 9:14 GMT+01:00 Danielsloth notifications@github.com:
library("rvest") library("plyr") 'css.selector="td"' 'css.selector="td:nth-child(6) , .filtered" including date' css.selector=".filtered" link <-"
http://www.ft.dk/dokumenter/vis_efter_type/spoergsmaal/spoergsmaal.aspx?questionSearchtype=1&startDate=20101006&endDate=20161001&session=&minister=-1&inquirer=-1&committee=-1&statusAnswer=-1&sortColumn=caseNumber&sortOrder=desc&startRecord=1&numberOfRecords=150000&totalNumberOfRecords=#dok " ft_data=read_html(link, encoding = "UTF-8") %>% html_nodes(css=css.selector) %>% html_text() ft_data df<-data.frame(ft_data)
fulddata <- matrix(ncol=1) for (i in 1:length(df[,1])){ link_spm <-"http://www.ft.dk/samling/20111/spoergsmaal/" link1 <- paste(paste(link_spm, df[i,1], sep = ""),"/index.htm", sep ="") link1<-gsub(" ", "", link1) fulddata <- rbind(fulddata, link1) 'Sys.sleep(1)' } fulddata<-as.data.frame(fulddata[-1,]) fulddata<-data.frame(lapply(fulddata, as.character), stringsAsFactors = FALSE) first.link <- fulddata[1,]
link.text1<-matrix(data="NA", ncol=7, nrow=length(fulddata[,1])) for (i in 1:length(fulddata[,1])){ first.link <- fulddata[i,] link.text=as.matrix(read_html(first.link, encoding = "UTF-8") %>% html_nodes(".tableTitle+ p , p a , h1") %>% html_text()) link.text1[i,] <- link.text
}
2015-12-07 8:32 GMT+01:00 Daniel Sloth Olesen <danielslotholesen@gmail.com
:
library("rvest") library("plyr") 'css.selector="td"' 'css.selector="td:nth-child(6) , .filtered" including date' css.selector=".filtered" link <-"
http://www.ft.dk/dokumenter/vis_efter_type/spoergsmaal/spoergsmaal.aspx?questionSearchtype=1&startDate=20101006&endDate=20161001&session=&minister=-1&inquirer=-1&committee=-1&statusAnswer=-1&sortColumn=caseNumber&sortOrder=desc&startRecord=1&numberOfRecords=150000&totalNumberOfRecords=#dok " ft_data=read_html(link, encoding = "UTF-8") %>% html_nodes(css=css.selector) %>% html_text() ft_data df<-data.frame(ft_data)
fulddata <- matrix(ncol=1) for (i in 1:length(df[,1])){ link_spm <-"http://www.ft.dk/samling/20111/spoergsmaal/" link1 <- paste(paste(link_spm, df[i,1], sep = ""),"/index.htm", sep ="") link1<-gsub(" ", "", link1) fulddata <- rbind(fulddata, link1) 'Sys.sleep(1)' } fulddata<-as.data.frame(fulddata[-1,]) fulddata<-data.frame(lapply(fulddata, as.character), stringsAsFactors = FALSE) first.link = as.data.frame(fulddata[1,])
first.link.text=read_html(first.link, encoding = "UTF-8") %>% html_nodes(".tableTitle+ p , p a , h1") %>% html_text()
2015-12-06 10:32 GMT+01:00 ChrisNygaard notifications@github.com:
title: "Eksamen" output: html_document
The paragraph 20 questions asked in the Danish Parliament (Folketinget), are questions directed at a secretary of the government from a regular member of parliament. They are used to directly aquire information from the particular secretary, and are usually rather critical of the law they are directed at.
First we need all our packages
Libraries
library("readr") library("plyr") library("ggplot2") library("dplyr") library("rvest") library("stringr") library("lubridate")
First we need to acquire the data. This is done by scraping the website of the Danish Parliament. Many of the issues can be solved simply by scraping the main pages with the question headline, author, department directed at, date and so on. This is done by a scraper functio that we define as such.
scrape <- function(link){
First we load the website
ft_data <- read_html(link, encoding="UTF-8")
Then we get the titles of the questions
ft_titler <- as.matrix(ft_data %>% html_nodes("h3 a") %>% html_text())
ft_titler <- as.matrix(str_trim(ft_titler, "both")) ft_titler <- gsub("Endeligt svar", "", ft_titler)
Then we acquire the wanted information
ft_systematisk = matrix(ft_data %>% html_nodes("div p") %>% html_text()) ft_systematisk <- as.matrix(str_trim(ft_systematisk, "both"))
ft_systematisk <- gsub("Endeligt svar", "", ft_systematisk)
ft_systematisk <-
return(ft_systematisk) }
Then we need to us our function on the link of our choosing. The link will be set in a loop, to ensure we get all the data right away. Thus we start with 200 views per page at page 1
ft_link <- " http://www.ft.dk/Search.aspx?q=&SearchDepthFull=&Udvalg=&ParagrafTyveType=&StilletTil=&StilletAf=&PeriodenBesvaretFrom=&PeriodenBesvaretTo=&Status=&Emneord=&sf=ps&msf=ps&as=1&Samling=20111&PeriodenStilletFrom=2010-07-01&PeriodenStilletTo=2015-11-21&SortBy=SortDate&SortOrder=asc&pageSize=200&pageNr=1#search " x1 <- scrape(ft_link) print(x1)
Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.
— Reply to this email directly or view it on GitHub https://github.com/ChrisNygaard/Eksamensgruppe19/issues/1.
— Reply to this email directly or view it on GitHub https://github.com/ChrisNygaard/Eksamensgruppe19/issues/1#issuecomment-162445036 .
title: "Eksamen2"
library("rvest")
library("plyr")
library("lubridate")
# Here we get the links
'css.selector="td"'
css.selector="td:nth-child(6) , .filtered" #including date
#css.selector=".filtered"
link <-"
http://www.ft.dk/dokumenter/vis_efter_type/spoergsmaal/spoergsmaal.aspx?questionSearchtype=1&startDate=20101006&endDate=20161001&session=&minister=-1&inquirer=-1&committee=-1&statusAnswer=-1&sortColumn=caseNumber&sortOrder=desc&startRecord=1&numberOfRecords=15000&totalNumberOfRecords=#dok
"
ft_data=read_html(link, encoding = "UTF-8") %>%
html_nodes(css=css.selector) %>%
html_text()
ft_data
# Vi skal bruge både serienummer og dato, og derfor smides numrene i
kolonne 1 og datoer i 2, og så
# extracter vi året
df[,1] <- ft_data[seq(1, length(ft_data), 2)]
df[,2] <- ft_data[seq(2, length(ft_data), 2)]
df[,2] <- gsub( " .*:", "",ft_data[seq(2, length(ft_data), 2)]))
#df[,2] <- as.Date(as.character(gsub( " .*$", "",ft_data[seq(2,
length(ft_data), 2)])), format="%d-%m-%Y")
df[,3] <- year( df[,2] )
fulddata <- matrix(ncol=1)
for (i in 1:length(df[,1])){
link_spm <-"http://www.ft.dk/samling/20111/spoergsmaal/"
link_p1 <-"http://www.ft.dk/samling/"
link1 <-paste(paste(link_p1, df[i,3], sep = ""),"1/spoergsmaal/", sep ="")
link1 <- paste(paste(link1, df[i,1], sep = ""),"/index.htm", sep ="")
link1<-gsub(" ", "", link1)
fulddata <- rbind(fulddata, link1)
'Sys.sleep(1)'
}
fulddata<-as.data.frame(fulddata[-1,])
fulddata<-data.frame(lapply(fulddata, as.character), stringsAsFactors =
FALSE)
first.link <- fulddata[1,]
link.text1<-matrix(data="NA", ncol=7, nrow=length(fulddata[,1]))
for (i in 1:length(fulddata[,1])){
first.link <- fulddata[i,]
link.text=as.matrix(read_html(first.link, encoding = "UTF-8") %>%
html_nodes(".tableTitle+ p , p a , h1") %>%
html_text())
link.text1[i,] <- link.text
2015-12-07 10:04 GMT+01:00 Christoffer Nygaard cbnygaard@gmail.com:
title: "Eksamen2"
output: html_document
library("rvest") library("plyr") library("lubridate") # Here we get the links 'css.selector="td"' css.selector="td:nth-child(6) , .filtered" #including date #css.selector=".filtered" link <-" http://www.ft.dk/dokumenter/vis_efter_type/spoergsmaal/spoergsmaal.aspx?questionSearchtype=1&startDate=20101006&endDate=20161001&session=&minister=-1&inquirer=-1&committee=-1&statusAnswer=-1&sortColumn=caseNumber&sortOrder=desc&startRecord=1&numberOfRecords=15000&totalNumberOfRecords=#dok " ft_data=read_html(link, encoding = "UTF-8") %>% html_nodes(css=css.selector) %>% html_text() ft_data # Vi skal bruge både serienummer og dato, og derfor smides numrene i kolonne 1 og datoer i 2, og så # extracter vi året df[,1] <- ft_data[seq(1, length(ft_data), 2)] df[,2] <- ft_data[seq(2, length(ft_data), 2)] df[,3] <- as.Date(gsub( " .*$", "", df[,2] ))#, format="%d-%m-%y") fulddata <- matrix(ncol=1) for (i in 1:length(df[,1])){ link_spm <-"http://www.ft.dk/samling/20111/spoergsmaal/" link1 <- paste(paste(link_spm, df[i,1], sep = ""),"/index.htm", sep ="") link1<-gsub(" ", "", link1) fulddata <- rbind(fulddata, link1) 'Sys.sleep(1)' } fulddata<-as.data.frame(fulddata[-1,]) fulddata<-data.frame(lapply(fulddata, as.character), stringsAsFactors = FALSE) first.link <- fulddata[1,] link.text1<-matrix(data="NA", ncol=7, nrow=length(fulddata[,1])) for (i in 1:length(fulddata[,1])){ first.link <- fulddata[i,] link.text=as.matrix(read_html(first.link, encoding = "UTF-8") %>% html_nodes(".tableTitle+ p , p a , h1") %>% html_text()) link.text1[i,] <- link.text
2015-12-07 9:14 GMT+01:00 Danielsloth notifications@github.com:
library("rvest") library("plyr") 'css.selector="td"' 'css.selector="td:nth-child(6) , .filtered" including date' css.selector=".filtered" link <-"
http://www.ft.dk/dokumenter/vis_efter_type/spoergsmaal/spoergsmaal.aspx?questionSearchtype=1&startDate=20101006&endDate=20161001&session=&minister=-1&inquirer=-1&committee=-1&statusAnswer=-1&sortColumn=caseNumber&sortOrder=desc&startRecord=1&numberOfRecords=150000&totalNumberOfRecords=#dok " ft_data=read_html(link, encoding = "UTF-8") %>% html_nodes(css=css.selector) %>% html_text() ft_data df<-data.frame(ft_data)
fulddata <- matrix(ncol=1) for (i in 1:length(df[,1])){ link_spm <-"http://www.ft.dk/samling/20111/spoergsmaal/" link1 <- paste(paste(link_spm, df[i,1], sep = ""),"/index.htm", sep ="") link1<-gsub(" ", "", link1) fulddata <- rbind(fulddata, link1) 'Sys.sleep(1)' } fulddata<-as.data.frame(fulddata[-1,]) fulddata<-data.frame(lapply(fulddata, as.character), stringsAsFactors = FALSE) first.link <- fulddata[1,]
link.text1<-matrix(data="NA", ncol=7, nrow=length(fulddata[,1])) for (i in 1:length(fulddata[,1])){ first.link <- fulddata[i,] link.text=as.matrix(read_html(first.link, encoding = "UTF-8") %>% html_nodes(".tableTitle+ p , p a , h1") %>% html_text()) link.text1[i,] <- link.text
}
2015-12-07 8:32 GMT+01:00 Daniel Sloth Olesen < danielslotholesen@gmail.com>:
library("rvest") library("plyr") 'css.selector="td"' 'css.selector="td:nth-child(6) , .filtered" including date' css.selector=".filtered" link <-"
http://www.ft.dk/dokumenter/vis_efter_type/spoergsmaal/spoergsmaal.aspx?questionSearchtype=1&startDate=20101006&endDate=20161001&session=&minister=-1&inquirer=-1&committee=-1&statusAnswer=-1&sortColumn=caseNumber&sortOrder=desc&startRecord=1&numberOfRecords=150000&totalNumberOfRecords=#dok " ft_data=read_html(link, encoding = "UTF-8") %>% html_nodes(css=css.selector) %>% html_text() ft_data df<-data.frame(ft_data)
fulddata <- matrix(ncol=1) for (i in 1:length(df[,1])){ link_spm <-"http://www.ft.dk/samling/20111/spoergsmaal/" link1 <- paste(paste(link_spm, df[i,1], sep = ""),"/index.htm", sep ="") link1<-gsub(" ", "", link1) fulddata <- rbind(fulddata, link1) 'Sys.sleep(1)' } fulddata<-as.data.frame(fulddata[-1,]) fulddata<-data.frame(lapply(fulddata, as.character), stringsAsFactors = FALSE) first.link = as.data.frame(fulddata[1,])
first.link.text=read_html(first.link, encoding = "UTF-8") %>% html_nodes(".tableTitle+ p , p a , h1") %>% html_text()
2015-12-06 10:32 GMT+01:00 ChrisNygaard notifications@github.com:
title: "Eksamen" output: html_document
The paragraph 20 questions asked in the Danish Parliament (Folketinget), are questions directed at a secretary of the government from a regular member of parliament. They are used to directly aquire information from the particular secretary, and are usually rather critical of the law they are directed at.
First we need all our packages
Libraries
library("readr") library("plyr") library("ggplot2") library("dplyr") library("rvest") library("stringr") library("lubridate")
First we need to acquire the data. This is done by scraping the website of the Danish Parliament. Many of the issues can be solved simply by scraping the main pages with the question headline, author, department directed at, date and so on. This is done by a scraper functio that we define as such.
scrape <- function(link){
First we load the website
ft_data <- read_html(link, encoding="UTF-8")
Then we get the titles of the questions
ft_titler <- as.matrix(ft_data %>% html_nodes("h3 a") %>% html_text())
ft_titler <- as.matrix(str_trim(ft_titler, "both")) ft_titler <- gsub("Endeligt svar", "", ft_titler)
Then we acquire the wanted information
ft_systematisk = matrix(ft_data %>% html_nodes("div p") %>% html_text()) ft_systematisk <- as.matrix(str_trim(ft_systematisk, "both"))
ft_systematisk <- gsub("Endeligt svar", "", ft_systematisk)
ft_systematisk <-
return(ft_systematisk) }
Then we need to us our function on the link of our choosing. The link will be set in a loop, to ensure we get all the data right away. Thus we start with 200 views per page at page 1
ft_link <- " http://www.ft.dk/Search.aspx?q=&SearchDepthFull=&Udvalg=&ParagrafTyveType=&StilletTil=&StilletAf=&PeriodenBesvaretFrom=&PeriodenBesvaretTo=&Status=&Emneord=&sf=ps&msf=ps&as=1&Samling=20111&PeriodenStilletFrom=2010-07-01&PeriodenStilletTo=2015-11-21&SortBy=SortDate&SortOrder=asc&pageSize=200&pageNr=1#search " x1 <- scrape(ft_link) print(x1)
Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.
— Reply to this email directly or view it on GitHub https://github.com/ChrisNygaard/Eksamensgruppe19/issues/1.
— Reply to this email directly or view it on GitHub https://github.com/ChrisNygaard/Eksamensgruppe19/issues/1#issuecomment-162445036 .
title: "Eksamen"
output: html_document
The paragraph 20 questions asked in the Danish Parliament (Folketinget), are questions directed at a secretary of the government from a regular member of parliament. They are used to directly aquire information from the particular secretary, and are usually rather critical of the law they are directed at.
First we need all our packages
First we need to acquire the data. This is done by scraping the website of the Danish Parliament. Many of the issues can be solved simply by scraping the main pages with the question headline, author, department directed at, date and so on. This is done by a scraper functio that we define as such.
ændring Then we need to us our function on the link of our choosing. The link will be set in a loop, to ensure we get all the data right away. Thus we start with 200 views per page at page 1
Note that the
echo = FALSE
parameter was added to the code chunk to prevent printing of the R code that generated the plot.