IDEMSInternational / R-Instat

A statistics software package powered by R
http://r-instat.org/
GNU General Public License v3.0
38 stars 103 forks source link

Importing from SQL databases #4620

Open maxwellfundi opened 6 years ago

maxwellfundi commented 6 years ago

It would be a good additional to have a new dialog that import data from tables in databases- Sql, Acess etc....

Patowhiz commented 5 years ago

I had a quick look at #4915, #5502 and in general the Climsoft Import dialog internal implementation, I must say we have already laid out the basics of connecting to the database.

Most SQL .Net drivers are quite powerful and can help us access the whole database schema, this leaves me to wonder why we didn't implement the connectivity to Climsoft database in a general way that doesn't need us to hardcode the table names and the columns?

@maxwellfundi @rdstern @dannyparsons so far in the several workshops you've been, have we not found users interested in connecting to databases apart from Climsoft users? Are our users mainly working with files and unstructured data?

dannyparsons commented 5 years ago

It's not been a hugely requested feature in my experience. If you think about the type of user who is comfortable using a database, they are less likely to be a typical R-Instat user and more likely to be comfortable in R, Python etc. Of course Climsoft is a special case because of the whole software around the database. There was an immediate need for a Climsoft connection by existing climatic users so that was the justification for the specific implementation.

Of course the generic implementation would be very beneficial to widen the data import possibilities, (which is why we have this issue for it). It's very doable but also non trivial so without a big request from users it hasn't been a high priority.

There is one database connection that is more requested and would be very impressive to some of our users for its wide use for surveys and data collection, which is importing from ODK forms https://github.com/africanmathsinitiative/R-Instat/issues/3754, not that this must be done before this issue though.

Patowhiz commented 2 years ago

@dannyparsons @lloyddewit @rdstern It's now 2022 and @lilyclements has written issue #7828 which relates to importing from Postgress sql database. I strongly suggest that dialog to be generally designed in a way that we can add support for importing from other RDBMS as well. All relational database management systems (RDBMS) have standard connection requirements which let's us easily design a general dialog. We can get some of the design and implementation concepts requested by @lilyclements in the climsoft import dialog. @dannyparsons I've seen it in the field, competent database managers wanting to do a graphical exploration of data in their database. A feature that RDBMS lacks. And these managers had no experience in R or Python, just SQL, Excel and Access.

rdstern commented 2 years ago

@lilyclements and @dannyparsons just to add to @Patowhiz comments, in the recent review of R GUIs there are the following topics: Import Data File Types (how many) - R-Instat does quite well here (31), together with BlueSky (30) and RKWard (31).
The next item in the review list is: Import Database (how many) - here BlueSky has 6, while Rattle and R-Instat have 1 each. The others have zero. The R-Instat database is from @Patowhiz importing from ClimSoft. It would be good to improve our score!

lilyclements commented 2 years ago

Sounds interesting! I guess the next step is knowing what is in common between these different data bases that relate to sql (eg climsoft dlg, and the suggested Postgres dialog), and to design a dialog. @Patowhiz do you know which dialogs in R Instat import from an sql database?

Patowhiz commented 2 years ago

@lilyclements currently we only have the climsoft one. The connection sub dialog has an implementation that can give you a hint of the standard requirements of a typical RDBMS. I would suggest using it in designing a general import dialog.

lilyclements commented 2 years ago

@Patowhiz what sort of input fields would we have on a general import dialog? Should we work on designing this?

Patowhiz commented 2 years ago

@lilyclements happy to help with the design.
The first step is of course to connect to a database. This design will probably be the same as the sub dialog for connecting to Climsoft database. Inputs required are; climsoft_connect

The disabled "Connect" button, asks for a password which we don't log in R-Instat.

After connecting, as user I would want to;

  1. Connect to a specific table. That means allow me type in the table name or select from a list of tables from the database.
  2. Select specific columns that I want to get data from. This is after specifying my table.
  3. Specify a filter condition. How we design this requires some though. In the background it just defines a where clause. This is an important step though because a table could potentially have billions if not millions of rows.

Thanks.

lilyclements commented 2 years ago

@Patowhiz great thanks for explaining all of this.

So presumably the sub-dialog you have displayed there will run:

plh_con <- DBI::dbConnect(RPostgres::Postgres(), user = "USERNAME", dbname = "DATABASE NAME", host = "HOST", port = PORTNUMBER, pass = "VALUE IN NEW DIALOG")

Then to your point:

Connect to a specific table. That means allow me type in the table name or select from a list of tables from the database.

I have been viewing the possible tables by DBI::dbListTables(plh_con). Is this the case here still?

Presumably that then loads the table. We need to load the table to view the column names to then select specific columns. Is this the case?

lilyclements commented 2 years ago

@derekagorhom has started work on the postgresr dialog, but I think we should hold off until we have discussed this dialog design.

@Patowhiz are you thinking this will go in the general file menu?

Patowhiz commented 2 years ago

@lilyclements yes to the above. Please note that we already have a command that will ask for the password. So the sub dialog simply sends the following script to R. data_book$database_connect(dbname="mariadb_climsoft_test_r_db_v4", host="localhost", port=3306, user="root") I would suggest we use the same connection prompt sub-dialog used by climsoft dialog. We can easily generalise it too.

In regards to the menu, @rdstern what's you suggestion? I presume the general file menu is fine?

lilyclements commented 2 years ago

Okay great. @Patowhiz where on this design do you then get to specify the specific table you want to import (that is, run the DBI::dbListTables(plh_con))? Are these the options in that drop down by "Selected Stations"?

image
Patowhiz commented 2 years ago

@lilyclements In climsoft we have hardcoded the tables, because we know them, and the dialog is specific to the database and tables it is importing them from. The options in the "Selected Stations" drop down just dictate the sql "select" statement that shows the stations. For instance, as a user I may want to view stations by station id or station name.

In this general case, I think you will have a single receiver (for tables) that expects the user to select the table from the selector. The selector will have the list of possible tables. Then from the selected table, you will have a multiple receiver (for columns from the selected table) that expects the columns you want to get data from. The 2 receivers will be linked to the same selector. So the selector sql query will be dictated by user actions when selecting respective receivers.

And all of the above is after connecting to the database (which could be a sub-dialog just like in your screenshot).

I hope I understood you question. And @rdstern as the product manager may object my design proposals, so would be good to counter check with him.

I have now updated the climsoft dialog in PR #7902, it now has similar functionalities as above but specific to climsoft database. I'm happy to take you through it if it saves you time. Thanks.