The dplyr package within R provides a flexible and powerful syntax for data wrangling operations. However, data objects within R are typically stored in memory and performance issues may arise as data become large. Database management systems implementing SQL (structured query language) provide a ubiquitous architecture for storing and querying data that is relational in nature. While there has been support for data retrieval in R from relational databases such as MySQL, SQLite, and PostgreSQL, recent advances that have added interfaces between R and SQL enable users to seamlessly leverage storage and retrieval mechanisms while remaining within R. In this webinar, we will review key idioms for data wrangling within dplyr, introduce the backend interfaces for common database systems, provide examples of ways that the dplyr engine translates a data pipeline, and discuss common misconceptions and performance issues.
The
dplyr
package within R provides a flexible and powerful syntax for data wrangling operations. However, data objects within R are typically stored in memory and performance issues may arise as data become large. Database management systems implementing SQL (structured query language) provide a ubiquitous architecture for storing and querying data that is relational in nature. While there has been support for data retrieval in R from relational databases such as MySQL, SQLite, and PostgreSQL, recent advances that have added interfaces between R and SQL enable users to seamlessly leverage storage and retrieval mechanisms while remaining within R. In this webinar, we will review key idioms for data wrangling withindplyr
, introduce the backend interfaces for common database systems, provide examples of ways that thedplyr
engine translates a data pipeline, and discuss common misconceptions and performance issues.