Closed RNAer closed 8 years ago
Commenting here as @mortonjt asked for input.
The specific question that @mortonjt asked me is if there is going to be any problem of using sqlite with Qiita - i.e. is there going to be a problem of adding micronota to Qiita if we are using sqlite for micronota?
The short answer is no. Plugins in qiita can use whatever they want (that is the advantage)! They can even use whatever programming language we want.
Now, long answer and I would like to also receive the input from @antgonza @rob-knight @ElDeveloper and others. There are some aspects to take into account when making this decision:
The argument that @mortonjt gave me in favor of SQLite is that it is easier to install so "less dependencies". From our experience in PostgreSQL, it doesn't seem hard to install (it is supported in most of the Linux systems by default - and it is really easy to download the .app for Mac OS X). And the amount of dependencies are the same, just changing SQLite by PostgreSQL.
That being said, I would recommend against going through the SQLite route. However, if @rob-knight, @antgonza and others do not see any issue with it, I'm fine with it.
I favor consolidation around one rdbms unless there is a specific technical reason to proliferate. We have gained immensely from standardizing our code base around other specific technologies vs labs that let everyone do their own random thing then the parts don't work together...
On Jan 22, 2016, at 8:54 AM, Jose Navas notifications@github.com wrote:
Commenting here as @mortonjt https://github.com/mortonjt asked for input.
The specific question that @mortonjt https://github.com/mortonjt asked me is if there is going to be any problem of using sqlite with Qiita - i.e. is there going to be a problem of adding micronota to Qiita if we are using sqlite for micronota?
The short answer is no. Plugins in qiita can use whatever they want (that is the advantage)! They can even use whatever programming language we want.
Now, long answer and I would like to also receive the input from @antgonza https://github.com/antgonza @rob-knight https://github.com/rob-knight @ElDeveloper https://github.com/ElDeveloper and others. There are some aspects to take into account when making this decision:
The argument that @mortonjt https://github.com/mortonjt gave me in favor of SQLite is that it is easier to install so "less dependencies". From our experience in PostgreSQL, it doesn't seem hard to install (it is supported in most of the Linux systems by default - and it is really easy to download the .app for Mac OS X). And the amount of dependencies are the same, just changing SQLite by PostgreSQL.
That being said, I would recommend against going through the SQLite route. However, if @rob-knight https://github.com/rob-knight, @antgonza https://github.com/antgonza and others do not see any issue with it, I'm fine with it.
— Reply to this email directly or view it on GitHub https://github.com/biocore/micronota/pull/40#issuecomment-173976027.
Completely agree. I will say that the best will be to make it default to pgsql and then have an option to have sqllite (obviously, this should be extremely low priority - basically, add a tag to won't-fix :stuck_out_tongue_winking_eye:).
From what I know, using pgsql is tremendous overkill for this application as it is limited to a single table, that for the most part is ephemeral or can be regenerated without too much trouble (IIRC). While I do agree that maintaining different systems written with different technologies is troublesome, in this case the SQL should be minimal, and for the most part not all that different than pg's SQL. Just my two cents.
On (Jan-22-16| 9:00), Antonio Gonzalez wrote:
Completely agree. I will say that the best will be to make it default to pgsql and then have an option to have sqllite (obviously, this should be extremely low priority - basically, add a tag to won't-fix :stuck_out_tongue_winking_eye:).
Reply to this email directly or view it on GitHub: https://github.com/biocore/micronota/pull/40#issuecomment-173977486
On Jan 22, 2016, at 10:32 AM, Yoshiki Vázquez Baeza < notifications@github.com> wrote:
From what I know, using pgsql is tremendous overkill for this application as it is limited to a single table, that for the most part is ephemeral or can be regenerated without too much trouble (IIRC). While I do agree that maintaining different systems written with different technologies is troublesome, in this case the SQL should be minimal, and for the most part not all that different than pg's SQL. Just my two cents.
On (Jan-22-16| 9:00), Antonio Gonzalez wrote:
Completely agree. I will say that the best will be to make it default to pgsql and then have an option to have sqllite (obviously, this should be extremely low priority - basically, add a tag to won't-fix :stuck_out_tongue_winking_eye:).
Reply to this email directly or view it on GitHub: https://github.com/biocore/micronota/pull/40#issuecomment-173977486
— Reply to this email directly or view it on GitHub https://github.com/biocore/micronota/pull/40#issuecomment-174003365.
Very true, agree with both points.
On (Jan-22-16|10:34), Rob Knight wrote:
- I bet it will have more tables in future.
- Same argument is used for reimplementing all kinds of functionality rather than using robust library, which you yourself have talked many people out of in code reviews in the past...
On Jan 22, 2016, at 10:32 AM, Yoshiki Vázquez Baeza < notifications@github.com> wrote:
From what I know, using pgsql is tremendous overkill for this application as it is limited to a single table, that for the most part is ephemeral or can be regenerated without too much trouble (IIRC). While I do agree that maintaining different systems written with different technologies is troublesome, in this case the SQL should be minimal, and for the most part not all that different than pg's SQL. Just my two cents.
On (Jan-22-16| 9:00), Antonio Gonzalez wrote:
Completely agree. I will say that the best will be to make it default to pgsql and then have an option to have sqllite (obviously, this should be extremely low priority - basically, add a tag to won't-fix :stuck_out_tongue_winking_eye:).
Reply to this email directly or view it on GitHub: https://github.com/biocore/micronota/pull/40#issuecomment-173977486
— Reply to this email directly or view it on GitHub https://github.com/biocore/micronota/pull/40#issuecomment-174003365.
Reply to this email directly or view it on GitHub: https://github.com/biocore/micronota/pull/40#issuecomment-174003835
:+1: on using psql. I think it makes sense to have a single database standard.
Thanks for all the inputs! Really appreciate it!
I agree with the point of using the single technology in the lab. However, I am not convinced that postgresql is better in this case. It would be an overkill while at the same time bring burdens on installation and usage, which might discourage users and developers. My argument is that:
The sql usage in micronota is very light in the foreseeable future. It is only used to compile the metadata of each gene family. And once compilation is done, we only need to query it and DB modification is minimal. In fact, it is perfectly fine to leave it as text tables, except the queries would be much more cumbersome. We can't think of a single example that sqlite3 can't do or do it worse than postgresql.
Yes, we will feedback the (meta)genomic annotation we have accumulated to improve the annotation databases and this will benefit from the power of postgresql, but I think it should be decoupled from micronota and be a standalone piece of module(s)/package.
And the unnecessary (in this case) burdens to use postgresql include:
Zech just noted that SQLite comes with the base Python install; accordingly I think it's fine to use and will facilitate developers getting involved who Postgres may scare off (although we acknowledge Postgres is a lot easier than it used to be). Yoshiki and Jose agree. So we should consider this discussion closed. Thanks everyone!
On Jan 22, 2016, at 1:56 PM, Zech Xu notifications@github.com wrote:
Thanks for all the inputs! Really appreciate it!
I agree with the point of using the single technology in the lab. However, I am not convinced that postgresql is better in this case. It would be an overkill while at the same time bring burdens on installation and usage, which might discourage users and developers. My argument is that:
1.
The sql usage in micronota is very light in the foreseeable future. It is only used to compile the metadata of each gene family. And once compilation is done, we only need to query it and DB modification is minimal. In fact, it is perfectly fine to leave it as text tables, except the queries would be much more cumbersome. We can't think of a single example that sqlite3 can't do or do it worse than postgresql.
Yes, we will feedback the (meta)genomic annotation we have accumulated to improve the annotation databases and this will benefit from the power of postgresql, but I think it should be decoupled from micronota and be a standalone piece of module(s)/package. 2.
sqlite3 syntax is so similar to postgresql and the sql code is so minimal that the code can be changes in 10 min. So it is painless to move if we unexpectedly do need postgresql’s power if future.
And the unnecessary (in this case) burdens to use postgresql include:
1.
much more hassle to install and config. You need install it differently on diff OS platforms and config it differently. You have to set up user account as well. It probably will cause a couple of hours for micronota users to set it up to run, while with sqlite3, one cmd of installation is sufficient to install. 2.
You have to run the postgresql server constantly in the background.
— Reply to this email directly or view it on GitHub https://github.com/biocore/micronota/pull/40#issuecomment-174065411.
As for testing, what do you think about hosting some small test files on microbe.me?
What do you think about adding a test case for create_db
in database.py
? Looks like the coverage dropped.
:+1:
@mortonjt , more code needs to write but ok for review. any idea for a best way to write the test code for tigrfam.py?