beanumber / tidy-databases

materials for ASA webinar on using databases in the tidyverse
1 stars 0 forks source link

add description to beginning of slides? #12

Closed nicholasjhorton closed 6 years ago

nicholasjhorton commented 6 years ago

The dplyr package within R provides a flexible and powerful syntax for data wrangling operations. However, data objects within R are typically stored in memory and performance issues may arise as data become large. Database management systems implementing SQL (structured query language) provide a ubiquitous architecture for storing and querying data that is relational in nature. While there has been support for data retrieval in R from relational databases such as MySQL, SQLite, and PostgreSQL, recent advances that have added interfaces between R and SQL enable users to seamlessly leverage storage and retrieval mechanisms while remaining within R. In this webinar, we will review key idioms for data wrangling within dplyr, introduce the backend interfaces for common database systems, provide examples of ways that the dplyr engine translates a data pipeline, and discuss common misconceptions and performance issues.