cal-itp / calitp-py

Tools for accessing and analyzing cal-itp data
https://docs.calitp.org/calitp-py
GNU General Public License v3.0
1 stars 0 forks source link

Start pulling in schedule refactor entities! #65

Closed atvaccaro closed 2 years ago

atvaccaro commented 2 years ago

One consequence of the schedule refactor is new code that we will want to share across Airflow as well as pod operators. This PR is the first step in that process; we'll continue to iterate and improve on this library so that less work is repeated across the pipeline.

Things implemented in this PR:

  1. Implements poetry for dependency management and package building/publishing
  2. Adjusts CI/CD to match above
  3. Brings in some initial Pydantic types for interacting with GCS artifacts
  4. Temporarily disables Docker image building
  5. Moves to calver since calitp-py will be heavily edited over the next few weeks/months as our refactor continues
atvaccaro commented 2 years ago
  • after this PR, are we ready to use calitp-py for the existing locations we use the utils versions of these classes? if not, what specific steps are needed for us to get there? I mostly just want to make sure I understand because for the schedule parsing stuff, I need to contribute new related classes... What goes in here vs. in Airflow utils? Like is our philosophy that stuff only goes in here if it's specifically needed by a PodOperator? I want to make sure that we're not maintaining parallel versions for any longer than we absolutely have to.

Yes, as soon as a good version of calitp-py is published to pypi, we can PR in data-infra to add it as a dependency in Airflow and change the imports. I'd like us to generally default to putting this in here as long as it isn't actually importing Airflow; I think it'll be easier to develop/test as well as pro-actively make things available to pod operators.

We can always reference specific git hashes while developing on the Airflow side if needed.

  • can you just call out in the PR description that we're moving to poetry for dependency management specifically? (PR title and description are kind of disparate right now, wondering if we can clarify that a bit for posterity, especially because working across the repos is already kinda confusing)

yep!