OHDSI / dbt-synthea

[Under development] A dbt ETL project to convert a Synthea synthetic data set into the OMOP CDM
https://ohdsi.github.io/dbt-synthea/
Apache License 2.0
16 stars 6 forks source link

chore: cleanup and refactor requirements #66

Closed lawrenceadams closed 1 month ago

lawrenceadams commented 1 month ago

At present requirements.txt has 63 dependencies of unknown origin (i.e. some are from dbt-core / dbt-postgres / dbt-duckdb / pre-commit / black etc...); likely from repeated pip freeze > requirements.txt runs.

Although this is not too many in the grand scheme of things, I think this will cause increasing issues for multiple reasons:


To resolve this I propose we split out our requirements into multiple requirement.txt files.

For example:

dbt-synthea/
└── requirements/
    ├── dev-tools.in
    ├── duckdb.in
    ├── postgres.in
    ├── snowflake.in
    ├── duckdb.txt
    ├── postgres.txt
    └── snowflake.txt

The downsides of this is more complexity, but greater ease of maintainability, and it becomes obvious what is responsible for each dependency.

There are multiple tools available for this; for example: pip-tools or (the more modern/faster) uv

adambouras commented 1 month ago

Hi @lawrenceadams ,

How about poetry? Could you share some pros and cons of using poetry?

Thank you, Adam

lawrenceadams commented 1 month ago

Hey @adambouras - I considered this, however in my view:

Pros:

Cons:

I'm a big fan of poetry and using proper project setups, but I don't think it really fits in this instance. Either way - I think this needs to be cleaned up at some point! Not that fussy on one approach/tool over the other, but I think we can simplify how we approach this.

Arguably, we don't need these files at all - we only really need dbt-duckdb or dbt-postgres to be installed with some linting/precommit/etc tools, but this makes it more approachable for a beginner and allows CI/Devcontainers/etc to be setup

Do you have any strong views? Happy to discuss!

adambouras commented 1 month ago

Hi Lawrence, I totally agree with you, and I am looking forward to testing the new changes Adam

On Sun, Sep 29, 2024, 4:45 PM Lawrence Adams @.***> wrote:

Hey @adambouras https://github.com/adambouras - I considered this, however in my view:

Pros:

  • lock file generation (essentially what generating xyz.txt files is doing - other tools also have this feature)

Cons:

  • poetry is focused around managing python projects
    • Do we want to setup a pyproject.toml file? This is not a python project that we're going to release on PyPi - as far as I can understand the user experience will be cloning the project and then running pip install xyz -- not running pip install dbt-synthea. This is a project that utilizes a lot of python packages, but is not a python project itself.
  • Poetry is quite complicated - especially for what we want

I'm a big fan of poetry and using proper project setups, but I don't think it really fits in this instance. Either way - I think this needs to be cleaned up at some point! Not that fussy on one approach/tool over the other, but I think we can simplify how we approach this.

Arguably, we don't need these files at all - we only really need dbt-duckdb or dbt-postgres to be installed with some linting/precommit/etc tools, but this makes it more approachable for a beginner and allows CI/Devcontainers/etc to be setup

Do you have any strong views? Happy to discuss!

— Reply to this email directly, view it on GitHub https://github.com/OHDSI/dbt-synthea/issues/66#issuecomment-2381596551, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACQELOSBW2BLMHRHTORMVLLZZBRHBAVCNFSM6AAAAABPB726WKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOBRGU4TMNJVGE . You are receiving this because you were mentioned.Message ID: @.***>