GenomicMedLab / dgipy

Python client for fast access to the Drug-Gene Interaction Database (DGIDb)
MIT License
0 stars 0 forks source link

Output data structures (Pandas, etc) #16

Closed jsstevenson closed 1 month ago

jsstevenson commented 6 months ago

Currently, Pandas has to be included as a dependency because it's imported in dgipy.dgidb. However, some users might not want to use Pandas at all. This is sort of possible by setting use_pandas to False, but you'd still need to have Pandas installed.

It might be more elegant to refactor pandas output (the _process_output method) into a separate module that imports pandas in a protected way (i.e. wraps import pandas in a try/except block) and then includes pandas as an optional dependency. Could also provide other kinds of data outputs (eg Polars, Dask, or maybe dumping to sqlite or something?) in this way as well.

jsstevenson commented 6 months ago

I guess there could be a world where you really just want the raw GraphQL, but the Pandas export methods perform a conversion that is probalby useful even if you don't want the Pandas. I'd say we should only perform the transform to pandas/other DF methods as a final step. Would subsume #17.

jsstevenson commented 3 months ago

My thought on this right now is to have some kind of exporter or converter module with to_pandas methods that constrains the import to only happen in the method, and then puts pandas into an optional dependency group

jsstevenson commented 1 month ago

Actually, we should just target the Python dataframe standard