DyfanJones / RAthena

Connect R to Athena using Boto3 SDK (DBI Interface)
https://dyfanjones.github.io/RAthena/
Other
35 stars 6 forks source link

Add Rstudio Viewer to display tables in RStudio's connection tab #57

Closed DyfanJones closed 4 years ago

DyfanJones commented 4 years ago

It would be good to get RAthena to display tables from Athena in Rstudio's connection tab: odbc:View.R

OssiLehtinen commented 4 years ago

This would be great too!

Again a side note:

I've been earlier using the generic odbc-interface (with the Simba odbc driver).

With that, if one clicks on the preview icon of a large table in RStudio, crazy things happen: namely, a query "select * from table" is executed in Athena and the resulting table is truncated only afterwards.

The thing is, Athena will always write a csv file of the query result to S3. Let's say, I do a preview of a large table (let's say to the tune of 100e9 rows). Now Athena will first try to rewrite all those rows to a temporary csv before Rstudio displays the first 1000 rows in the preview. Not very good...

Anyway, this can be avoided with the custom package, but wanted to share this specific hole I have found myself in :)

DyfanJones commented 4 years ago

PR #58 contains initial RStudio connection tab integration. It mainly utilises AWS Glue to create database hierarchy and return column types etc... (Note this will be limited by 1000 tables per database due to api call limitation). This is to try to prevent previously mentioned issue. Also a limit of 1000 has been added to every table previewed. However this is only done by 'LIMIT 1000'. If users have partitioned files this should be ok and will prevent AWS Athena reading entire tables.

Currently this implementation is working for me, however if there is any issues please raise on PR #58

DyfanJones commented 4 years ago

Fixed issue with sql previewer (view PR #58 for information issue and fix). I will package this up and push to the cran before I go on holiday, so users aren't waiting for me to come back for the latest features.

I have also added an information message for amount of data scanned by aws athena query.

OssiLehtinen commented 4 years ago

Works great also!