Closed IanSudbery closed 6 years ago
Hi Ian,
It looks like Database.py works with both for forward compatibility. It uses sqlite3
for path
and sqlalchemy
for url
.
Most of the code in cgat-flow
connect with sqlite3
directly:
$ grep -n 'database_name' -r CGATPipelines | egrep -v 'Binary'
CGATPipelines/PipelineIntervalAnnotation.py:124: dbhandle = sqlite3.connect(PARAMS["database_name"])
CGATPipelines/PipelineIntervals.py:2611: dbhandle = sqlite3.connect(P.get_params()["database_name"])
CGATPipelines/pipeline_exome.py:1483: database = PARAMS["database_name"]
CGATPipelines/pipeline_rnaseqdiffexpression.py:417: dbh = sqlite3.connect(PARAMS["database_name"])
CGATPipelines/pipeline_bamstats.py:202: dbh = sqlite3.connect(PARAMS["database_name"])
CGATPipelines/PipelineGO.py:69: database_name
CGATPipelines/PipelineGO.py:98: dbh = sqlite3.connect(PARAMS["database_name"])
CGATPipelines/PipelineGO.py:446: dbhandle = sqlite3.connect(PARAMS["database_name"])
CGATPipelines/PipelineEnrichment.py:306: dbhandle = sqlite3.connect(P.get_params()["database_name"])
CGATPipelines/PipelineEnrichment.py:544: dbhandle = sqlite3.connect(P.get_params()["database_name"])
CGATPipelines/pipeline_splicing.py:174: dbh = sqlite3.connect(PARAMS["database_name"])
CGATPipelines/pipeline_peakcalling.py:1014: PARAMS["database_name"])
CGATPipelines/pipeline_peakcalling.py:1085: PARAMS["database_name"])
CGATPipelines/pipeline_mapping.py:274: dbh = sqlite3.connect(PARAMS["database_name"])
CGATPipelines/pipeline_genesets.py:372: dbh = sqlite3.connect(PARAMS["database_name"])
CGATPipelines/pipeline_motifs.py:254: dbh = sqlite3.connect(PARAMS["database_name"])
CGATPipelines/pipeline_exome_cancer.py:175: dbh = sqlite3.connect(PARAMS["database_name"])
CGATPipelines/PipelineGeneset.py:688: dbhandle = sqlite3.connect(PARAMS["database_name"])
Just found a couple that uses Database.py
in cgat-core
:
$ grep Database -r CGATPipelines | egrep -v '__pycache__' | grep import
CGATPipelines/PipelineGtfsubset.py:import CGATCore.Database as Database
CGATPipelines/PipelineRnaseq.py:import CGATCore.Database as Database
CGATPipelines/pipeline_peakcalling.py:import CGATCore.Database as DB
$ grep Database.connect -r CGATPipelines
CGATPipelines/PipelineGtfsubset.py: dbhandle = Database.connect(url="mysql://{user}@{host}/{database}".format(**locals()))
CGATPipelines/configuration/pipeline.ini:# Database connection parameters
CGATPipelines/configuration/pipeline.yml:# Database connection parameters
$ grep 'DB\.' -r CGATPipelines
CGATPipelines/PipelineReadqc.py: parser = CSV2DB.buildParser()
CGATPipelines/PipelineReadqc.py: CSV2DB.run(inf, options)
CGATPipelines/PipelineReadqc.py: CSV2DB.run(inf, options)
CGATPipelines/pipeline_peakcalling.py: insert_size = DB.fetch_DataFrame("SELECT * FROM insert_sizes",
CGATPipelines/pipeline_peakcalling.py: insert_size = DB.fetch_DataFrame("SELECT * FROM insert_sizes",
I think we should encourage new pipelines built on top of cgat-core
to use the Database.py
module with url
, but I am inclined to save time and not refactor the existing code.
Please let me know your thoughts.
Best regards, Sebastian
The most common use of the database is CGATCore.Pipeline.Database.load
, which uses database_url
.
I was expecting the changes in https://github.com/cgat-developers/cgat-flow/pull/37 to fail Jenkins tests but it does not. I am confused. Are these changes the solution?
I think so? The problem is that at the moment there are hard coded defaults for these parameters of database_name=./csvdb
and database_url=sqlite:///./csvdb
, so the problem only manifests if you need to use some value other than these.
I agree. If you can provide a reproducible example where it fails, I am happy to help and fix it.
Please re-open if you find an example to reproduce the issue.
Pipelines
mapping
,bamstats
andgenesets
still have settingwhen the code is now looking for
database_url
and this should be a URL rather than a path.