biocommons / hgvs

Python library to parse, format, validate, normalize, and map sequence variants. `pip install hgvs`
https://hgvs.readthedocs.io/
Apache License 2.0
234 stars 93 forks source link

make data provider dependencies optional #199

Open reece opened 9 years ago

reece commented 9 years ago

Originally reported by: Reece Hart (Bitbucket: reece, GitHub: reece)


Reviewer 3 suggested:

UTA looks incredibly useful, so this only a small point. Integration with it does increase the install requirements, specifically psycopg2 and a postgresql-client. It might be worth making this an optional dependency to improve installation for users on more minimal systems.


reece commented 9 years ago

Original comment by Reece Hart (Bitbucket: reece, GitHub: reece):


Added Maximilian's instructions in 6b9445ac6b5c. Keeping open until resolved.

reece commented 9 years ago

Original comment by Maximilian Haeussler (Bitbucket: maximilianh, GitHub: maximilianh):


I am on OSX/Homebrew:

brew install postgresql

reece commented 9 years ago

Original comment by Reece Hart (Bitbucket: reece, GitHub: reece):


I agree. Which platform are you on and what did you need to do to install PostgreSQL? I'll put that in the instructions.

Thanks for suggesting this.

reece commented 9 years ago

Original comment by Maximilian Haeussler (Bitbucket: maximilianh, GitHub: maximilianh):


For me at least, it would be good if you document the psql dependency. The dependency itself is not a problem, it's normal that one needs some db layer access.

reece commented 9 years ago

Original comment by Reece Hart (Bitbucket: reece, GitHub: reece):


Update: Requiring postgresql native libs for all clients/users is clearly non-ideal. On the other hand, a significant portion of the utility of the package is lost without having access to gene, transcript, and sequence data.

We will push to implement a REST-based data source for UTA soon, thus eliminating the libpq dependency entirely.

reece commented 9 years ago

Original comment by Reece Hart (Bitbucket: reece, GitHub: reece):


Issue #229 was marked as a duplicate of this issue.

reece commented 9 years ago

Original comment by Reece Hart (Bitbucket: reece, GitHub: reece):


Issue #211 was marked as a duplicate of this issue.

reece commented 9 years ago

Original comment by Reece Hart (Bitbucket: reece, GitHub: reece):


See http://pythonhosted.org/setuptools/setuptools.html#declaring-extras-optional-features-with-their-own-dependencies

reece commented 9 years ago

Original comment by Reece Hart (Bitbucket: reece, GitHub: reece):


We have considered this several times, but believe that this is not the right time to remove this dependency.

The rationale was that mapping requires a data provider and that we expect mapping to be an oft-used feature. Thus, this decision is a balance between A) the cost of installing psycopg2 for people who don't use mapping or validation and B) the difficulty created for people who discover this dependency at runtime.

We are considering providing a REST interface to UTA, which would obviate psycopg2 for public UTA access.

Please put alternative views in comments. If the majority of people really are using only the parsing and formatting component -- that is, without mapping or validation -- then that would certainly sway this decision.

We will continue to consider this suggestion.

github-actions[bot] commented 4 months ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 4 months ago

This issue was closed because it has been stalled for 7 days with no activity.

mbiokyle29 commented 3 months ago

I realize this is stale, but I wanted to add another "voice" to the discussion. I have been working with the hgvs library for the better part of a year now, mostly for the ability to have a consistent framework and language to describe sequence variants. The use case I am working on is more or less completely distinct from using any datasource (at this time). We are using it to track variations in synthetic sequences and proteins. We've actually released an open source package to assist with this kind of thing. The idea is to give it 2 sequences, and get back the variants between them in HGVS spec, based on a global alignment. The package is here: https://github.com/mammothbio-os/palamedes. It is mostly just a translation layer between a biopython Alignment and the various SequenceVariant sub-types in existence. This could not have been made possible without the hard work of the hgsv folks, @reece especially. So I wanted to start with a huge thank you for all of the work.

Focusing in on this postgresql issue, we've already had a few folks (and the CI on our web app) mention that the postgres system dependency was a bit "big" for what our library provides directly (it depends on hgvs under the hood since it returns SequenceVariant objects). I know that myself and likely a few other folks would be able to commit some time to coming up with a path forward here, if that is something that the maintainers desire here. I played around with it a little bit, the psycopg2 dependencies are well isolated within the datasources repo so it seems like not a huge lift to make it an extra dependency and put something like this in datasources/__init__.py:

try:
    import psycopg2
except ModuleNotFoundError:
    print(
        "ModuleNotFoundError raised when trying to import psycopg2, if you need dataproviders support please "
        "install system dependencies and pip install with [dataproviders] extra!"
    )
    raise

Getting around this would require the user to install the system deps as well as pip install hgvs[datasources], assuming we make the following changes to the pypackage.toml:

dependencies=[
    "attrs >= 17.4.0",  # https://github.com/biocommons/hgvs/issues/473
    "biocommons.seqrepo >= 0.6.5",
    "bioutils >= 0.4.0,<1.0",
    "configparser >= 3.3.0",
    "ipython",
    "parsley",
    "six",
]

[project.optional-dependencies]
dataproviders = ['psycopg2']

This would obviously be a breaking change, which is a huge downside. The other option I can see would be a "core" package with the models and parser, which is installed into the hgvs package where the data sources and any other "extra" things are. These more esoteric use cases could install just the core package and not need the postgres dependencies.

tl;dr Myself and likely a few other folks can contribute time to addressing this postgres issue, in whatever way the maintainers feel is appropriate, if they do feel it is at all appropriate.

github-actions[bot] commented 1 week ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.