Open reece opened 9 years ago
Original comment by Maximilian Haeussler (Bitbucket: maximilianh, GitHub: maximilianh):
I am on OSX/Homebrew:
brew install postgresql
Original comment by Maximilian Haeussler (Bitbucket: maximilianh, GitHub: maximilianh):
For me at least, it would be good if you document the psql dependency. The dependency itself is not a problem, it's normal that one needs some db layer access.
Original comment by Reece Hart (Bitbucket: reece, GitHub: reece):
Update: Requiring postgresql native libs for all clients/users is clearly non-ideal. On the other hand, a significant portion of the utility of the package is lost without having access to gene, transcript, and sequence data.
We will push to implement a REST-based data source for UTA soon, thus eliminating the libpq dependency entirely.
Original comment by Reece Hart (Bitbucket: reece, GitHub: reece):
We have considered this several times, but believe that this is not the right time to remove this dependency.
The rationale was that mapping requires a data provider and that we expect mapping to be an oft-used feature. Thus, this decision is a balance between A) the cost of installing psycopg2 for people who don't use mapping or validation and B) the difficulty created for people who discover this dependency at runtime.
We are considering providing a REST interface to UTA, which would obviate psycopg2 for public UTA access.
Please put alternative views in comments. If the majority of people really are using only the parsing and formatting component -- that is, without mapping or validation -- then that would certainly sway this decision.
We will continue to consider this suggestion.
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been stalled for 7 days with no activity.
I realize this is stale, but I wanted to add another "voice" to the discussion. I have been working with the hgvs library for the better part of a year now, mostly for the ability to have a consistent framework and language to describe sequence variants. The use case I am working on is more or less completely distinct from using any datasource (at this time). We are using it to track variations in synthetic sequences and proteins. We've actually released an open source package to assist with this kind of thing. The idea is to give it 2 sequences, and get back the variants between them in HGVS spec, based on a global alignment. The package is here: https://github.com/mammothbio-os/palamedes. It is mostly just a translation layer between a biopython Alignment
and the various SequenceVariant
sub-types in existence. This could not have been made possible without the hard work of the hgsv folks, @reece especially. So I wanted to start with a huge thank you for all of the work.
Focusing in on this postgresql issue, we've already had a few folks (and the CI on our web app) mention that the postgres system dependency was a bit "big" for what our library provides directly (it depends on hgvs
under the hood since it returns SequenceVariant
objects). I know that myself and likely a few other folks would be able to commit some time to coming up with a path forward here, if that is something that the maintainers desire here. I played around with it a little bit, the psycopg2
dependencies are well isolated within the datasources
repo so it seems like not a huge lift to make it an extra
dependency and put something like this in datasources/__init__.py
:
try:
import psycopg2
except ModuleNotFoundError:
print(
"ModuleNotFoundError raised when trying to import psycopg2, if you need dataproviders support please "
"install system dependencies and pip install with [dataproviders] extra!"
)
raise
Getting around this would require the user to install the system deps as well as pip install hgvs[datasources]
, assuming we make the following changes to the pypackage.toml
:
dependencies=[
"attrs >= 17.4.0", # https://github.com/biocommons/hgvs/issues/473
"biocommons.seqrepo >= 0.6.5",
"bioutils >= 0.4.0,<1.0",
"configparser >= 3.3.0",
"ipython",
"parsley",
"six",
]
[project.optional-dependencies]
dataproviders = ['psycopg2']
This would obviously be a breaking change, which is a huge downside. The other option I can see would be a "core" package with the models and parser, which is installed into the hgvs
package where the data sources and any other "extra" things are. These more esoteric use cases could install just the core package and not need the postgres dependencies.
tl;dr Myself and likely a few other folks can contribute time to addressing this postgres issue, in whatever way the maintainers feel is appropriate, if they do feel it is at all appropriate.
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.
Originally reported by: Reece Hart (Bitbucket: reece, GitHub: reece)
Reviewer 3 suggested: