biocore / micronota

annotation pipeline for microbial genomes and metagenomes
BSD 3-Clause "New" or "Revised" License
18 stars 10 forks source link

database framework and tigrfam #40

Closed RNAer closed 8 years ago

RNAer commented 8 years ago

@mortonjt , more code needs to write but ok for review. any idea for a best way to write the test code for tigrfam.py?

josenavas commented 8 years ago

Commenting here as @mortonjt asked for input.

The specific question that @mortonjt asked me is if there is going to be any problem of using sqlite with Qiita - i.e. is there going to be a problem of adding micronota to Qiita if we are using sqlite for micronota?

The short answer is no. Plugins in qiita can use whatever they want (that is the advantage)! They can even use whatever programming language we want.

Now, long answer and I would like to also receive the input from @antgonza @rob-knight @ElDeveloper and others. There are some aspects to take into account when making this decision:

  1. I'm assuming that if micronota is using its own database, we want to have that database backed-up. On this end, I would recommend discussing with Jeff prior deciding on sqlite, to make sure how hard is to have that back-up system in place, setting up sqlite server, etc.
  2. IMO, the most important. Thinking on the future, it is possible that none of us is in the lab to keep maintaining all the code that we have. Reducing the amount of technologies that a new developer in the lab needs to learn will be desirable. At this point, the lab is already using PostgreSQL as a DB technology, so I'm unsure why do we want to add another one, like SQLite.

The argument that @mortonjt gave me in favor of SQLite is that it is easier to install so "less dependencies". From our experience in PostgreSQL, it doesn't seem hard to install (it is supported in most of the Linux systems by default - and it is really easy to download the .app for Mac OS X). And the amount of dependencies are the same, just changing SQLite by PostgreSQL.

That being said, I would recommend against going through the SQLite route. However, if @rob-knight, @antgonza and others do not see any issue with it, I'm fine with it.

rob-knight commented 8 years ago

I favor consolidation around one rdbms unless there is a specific technical reason to proliferate. We have gained immensely from standardizing our code base around other specific technologies vs labs that let everyone do their own random thing then the parts don't work together...

On Jan 22, 2016, at 8:54 AM, Jose Navas notifications@github.com wrote:

Commenting here as @mortonjt https://github.com/mortonjt asked for input.

The specific question that @mortonjt https://github.com/mortonjt asked me is if there is going to be any problem of using sqlite with Qiita - i.e. is there going to be a problem of adding micronota to Qiita if we are using sqlite for micronota?

The short answer is no. Plugins in qiita can use whatever they want (that is the advantage)! They can even use whatever programming language we want.

Now, long answer and I would like to also receive the input from @antgonza https://github.com/antgonza @rob-knight https://github.com/rob-knight @ElDeveloper https://github.com/ElDeveloper and others. There are some aspects to take into account when making this decision:

  1. I'm assuming that if micronota is using its own database, we want to have that database backed-up. On this end, I would recommend discussing with Jeff prior deciding on sqlite, to make sure how hard is to have that back-up system in place, setting up sqlite server, etc.
  2. IMO, the most important. Thinking on the future, it is possible that none of us is in the lab to keep maintaining all the code that we have. Reducing the amount of technologies that a new developer in the lab needs to learn will be desirable. At this point, the lab is already using PostgreSQL as a DB technology, so I'm unsure why do we want to add another one, like SQLite.

The argument that @mortonjt https://github.com/mortonjt gave me in favor of SQLite is that it is easier to install so "less dependencies". From our experience in PostgreSQL, it doesn't seem hard to install (it is supported in most of the Linux systems by default - and it is really easy to download the .app for Mac OS X). And the amount of dependencies are the same, just changing SQLite by PostgreSQL.

That being said, I would recommend against going through the SQLite route. However, if @rob-knight https://github.com/rob-knight, @antgonza https://github.com/antgonza and others do not see any issue with it, I'm fine with it.

— Reply to this email directly or view it on GitHub https://github.com/biocore/micronota/pull/40#issuecomment-173976027.

antgonza commented 8 years ago

Completely agree. I will say that the best will be to make it default to pgsql and then have an option to have sqllite (obviously, this should be extremely low priority - basically, add a tag to won't-fix :stuck_out_tongue_winking_eye:).

ElDeveloper commented 8 years ago

From what I know, using pgsql is tremendous overkill for this application as it is limited to a single table, that for the most part is ephemeral or can be regenerated without too much trouble (IIRC). While I do agree that maintaining different systems written with different technologies is troublesome, in this case the SQL should be minimal, and for the most part not all that different than pg's SQL. Just my two cents.

On (Jan-22-16| 9:00), Antonio Gonzalez wrote:

Completely agree. I will say that the best will be to make it default to pgsql and then have an option to have sqllite (obviously, this should be extremely low priority - basically, add a tag to won't-fix :stuck_out_tongue_winking_eye:).


Reply to this email directly or view it on GitHub: https://github.com/biocore/micronota/pull/40#issuecomment-173977486

rob-knight commented 8 years ago
  1. I bet it will have more tables in future.
  2. Same argument is used for reimplementing all kinds of functionality rather than using robust library, which you yourself have talked many people out of in code reviews in the past...

On Jan 22, 2016, at 10:32 AM, Yoshiki Vázquez Baeza < notifications@github.com> wrote:

From what I know, using pgsql is tremendous overkill for this application as it is limited to a single table, that for the most part is ephemeral or can be regenerated without too much trouble (IIRC). While I do agree that maintaining different systems written with different technologies is troublesome, in this case the SQL should be minimal, and for the most part not all that different than pg's SQL. Just my two cents.

On (Jan-22-16| 9:00), Antonio Gonzalez wrote:

Completely agree. I will say that the best will be to make it default to pgsql and then have an option to have sqllite (obviously, this should be extremely low priority - basically, add a tag to won't-fix :stuck_out_tongue_winking_eye:).


Reply to this email directly or view it on GitHub: https://github.com/biocore/micronota/pull/40#issuecomment-173977486

— Reply to this email directly or view it on GitHub https://github.com/biocore/micronota/pull/40#issuecomment-174003365.

ElDeveloper commented 8 years ago

Very true, agree with both points.

On (Jan-22-16|10:34), Rob Knight wrote:

  1. I bet it will have more tables in future.
  2. Same argument is used for reimplementing all kinds of functionality rather than using robust library, which you yourself have talked many people out of in code reviews in the past...

On Jan 22, 2016, at 10:32 AM, Yoshiki Vázquez Baeza < notifications@github.com> wrote:

From what I know, using pgsql is tremendous overkill for this application as it is limited to a single table, that for the most part is ephemeral or can be regenerated without too much trouble (IIRC). While I do agree that maintaining different systems written with different technologies is troublesome, in this case the SQL should be minimal, and for the most part not all that different than pg's SQL. Just my two cents.

On (Jan-22-16| 9:00), Antonio Gonzalez wrote:

Completely agree. I will say that the best will be to make it default to pgsql and then have an option to have sqllite (obviously, this should be extremely low priority - basically, add a tag to won't-fix :stuck_out_tongue_winking_eye:).


Reply to this email directly or view it on GitHub: https://github.com/biocore/micronota/pull/40#issuecomment-173977486

— Reply to this email directly or view it on GitHub https://github.com/biocore/micronota/pull/40#issuecomment-174003365.


Reply to this email directly or view it on GitHub: https://github.com/biocore/micronota/pull/40#issuecomment-174003835

mortonjt commented 8 years ago

:+1: on using psql. I think it makes sense to have a single database standard.

RNAer commented 8 years ago

Thanks for all the inputs! Really appreciate it!

I agree with the point of using the single technology in the lab. However, I am not convinced that postgresql is better in this case. It would be an overkill while at the same time bring burdens on installation and usage, which might discourage users and developers. My argument is that:

  1. The sql usage in micronota is very light in the foreseeable future. It is only used to compile the metadata of each gene family. And once compilation is done, we only need to query it and DB modification is minimal. In fact, it is perfectly fine to leave it as text tables, except the queries would be much more cumbersome. We can't think of a single example that sqlite3 can't do or do it worse than postgresql.

    Yes, we will feedback the (meta)genomic annotation we have accumulated to improve the annotation databases and this will benefit from the power of postgresql, but I think it should be decoupled from micronota and be a standalone piece of module(s)/package.

  2. sqlite3 syntax is so similar to postgresql and the sql code is so minimal that the code can be changed in 10 min. So it is painless to move if we unexpectedly do need postgresql’s power if future.

And the unnecessary (in this case) burdens to use postgresql include:

  1. much more hassle to install and config. You need install it differently on diff OS platforms and config it differently. You have to set up user account as well. It probably will cause a couple of hours for micronota users to set it up to run, while with sqlite3, one cmd of installation is sufficient to install.
  2. You have to run the postgresql server constantly in the background.
rob-knight commented 8 years ago

Zech just noted that SQLite comes with the base Python install; accordingly I think it's fine to use and will facilitate developers getting involved who Postgres may scare off (although we acknowledge Postgres is a lot easier than it used to be). Yoshiki and Jose agree. So we should consider this discussion closed. Thanks everyone!

On Jan 22, 2016, at 1:56 PM, Zech Xu notifications@github.com wrote:

Thanks for all the inputs! Really appreciate it!

I agree with the point of using the single technology in the lab. However, I am not convinced that postgresql is better in this case. It would be an overkill while at the same time bring burdens on installation and usage, which might discourage users and developers. My argument is that:

1.

The sql usage in micronota is very light in the foreseeable future. It is only used to compile the metadata of each gene family. And once compilation is done, we only need to query it and DB modification is minimal. In fact, it is perfectly fine to leave it as text tables, except the queries would be much more cumbersome. We can't think of a single example that sqlite3 can't do or do it worse than postgresql.

Yes, we will feedback the (meta)genomic annotation we have accumulated to improve the annotation databases and this will benefit from the power of postgresql, but I think it should be decoupled from micronota and be a standalone piece of module(s)/package. 2.

sqlite3 syntax is so similar to postgresql and the sql code is so minimal that the code can be changes in 10 min. So it is painless to move if we unexpectedly do need postgresql’s power if future.

And the unnecessary (in this case) burdens to use postgresql include:

1.

much more hassle to install and config. You need install it differently on diff OS platforms and config it differently. You have to set up user account as well. It probably will cause a couple of hours for micronota users to set it up to run, while with sqlite3, one cmd of installation is sufficient to install. 2.

You have to run the postgresql server constantly in the background.

— Reply to this email directly or view it on GitHub https://github.com/biocore/micronota/pull/40#issuecomment-174065411.

mortonjt commented 8 years ago

As for testing, what do you think about hosting some small test files on microbe.me?

mortonjt commented 8 years ago

What do you think about adding a test case for create_db in database.py? Looks like the coverage dropped.

mortonjt commented 8 years ago

:+1: