Closed v0dro closed 7 years ago
Probably related: https://github.com/SciRuby/daru/issues/138
@v0dro @zverok - Can we use ActiveRecord
gem here to work with SQL database and collect query reposonses
from the table as an object
/ list
? After receiving an object
of a table, we can probably then tweak the from_sql
function of Daru::DataFrame
module.
@athityakumar Well, I am definitely against adding ActiveRecord as a dependency for a Daru core. In attempt to support any DB possible, with direct dependency on some abstraction layer/ORM, probably companion gem like daru-db
could be created. @v0dro, WDYT?..
(And, BTW, if to decide on what to depend for database access, I'd prefer much more lightweight Sequel library.)
@zverok is correct. We should not add extra dependencies.
For this particular issue I think you can get away with just checking if the extension of the file name is .db
. If that is the case, you can a library for loading the database from the file.
Daru currently uses dbi
and sqlite3
as development dependencies. dbi
can use different database adapters for loading data. If you can use dbi
for loading from file that would be great so that the user only needs to install the appropriate adapter.
I am going to see what can be done with DBI. If that does not work, I think between sqlite3 and this dbf gem we can have a large portion of database file types covered.
In regards to a companion gem, how big would daru-db
have to be? Would it need to support relational, no sql, and file based db's?
EDIT: Pull Request
As I've already wrote in PR,
you should first clarify its scope (by investigating "which formats of binary DB files are available on popular scientfic/governmental/business datasources), and only then the coding attempt.
E.g., it is a necessary step to check:
We'd be really happy if you could spend some (not much, yet decent) effort to answer and dicsuss those questions.
@zverok, I just wanted to make sure the PR was referenced here.
Other database file formats exist, namely: DBF, however they do not support SQL in the traditional sense. If we wanted to support them it does not make sense to add the functionality from within DataFrame.from_sql
. There are utilities that allow querying of these formats with SQL like syntax, however it is just syntactic sugar. I took a look at how pandas handles these formats, and it does not. Instead there are utilities for converting these formats into CSV files.
If we want to add support for the DBF format for instance, I think it would make more sense to do so from a new class, not SqlDataSource
.
A second item to note regarding pandas. It does not support loading a SQLite file by path. It's support is similar to what Daru already has implemented.
Now, @v0dro, it is probably a question to you: what use cases you've had in mind when creating this GitHub issue?
Basically, I had a SQLite file database.db
which I found hard to read into a DataFrame without a lot work and reading documentation.
An SQL database file is still just a database, and opening it in a dataframe should not be much of a hassle.
@dshvimer I saw the pandas example of opening an SQLite database, and found it to be quite satisfactory. However, it is common practice to share test databases (mostly SQLite) on web in the form of .db
files and opening at least SQLite files should be a trivial thing to do in daru, hence this issue.
@zverok I think we can get away by implementing read-from-file only for simple SQL databases, given the complexity of implementing a one-size-fits-all solution.
Basically, I had a SQLite file database.db which I found hard to read into a DataFrame without a lot work and reading documentation.
But was it SQLite? Or something else? That's what I am trying to understand -- is there case for data distributed in format of "sqlite db"?
Yes. It was SQLite.
This one specifically: https://github.com/pratapvardhan/pycon-2015/blob/master/data/aapdonation.db
OK, let's say that SQLite is our choice. I'll re-review everything from this point of view tomorrow.
If we need to load this directly from the web, I may need to make some changes.
May need to consider this issue as well
No you don't need to load from the web. Its sufficient to expect the user to have downloaded the file onto their machine.
Can someone explain how the sql data source is tested? I don't see the sql database in the test suite but the db_name
is used in the spec.
@ananyo2012 SQLite database is created dynamically on the fly: https://github.com/SciRuby/daru/blob/master/spec/support/database_helper.rb#L11
BTW, this issue should be considered closed :)
Currently the
.from_sql
method requires aDBI::DatabaseHandle
object for loading data from an SQL database. In case you have adatabase.db
file in your system, it becomes cumbersome to load this data.There should be support for loading a database from such a file with the
Daru::DataFrame.from_sql
method.Example: