ImagingDataCommons / idc-index-data

Python package providing the index to query and download data hosted by the NCI Imaging Data Commons
MIT License
1 stars 4 forks source link

feat: Add support for generating and bundling index files during wheel build #9

Closed jcfr closed 5 months ago

jcfr commented 5 months ago

This pull request was created directly from a branch published into the upstream repo to workaround the fact secret can not be used (by default) from pull request created from fork.

It is based on:

fedorov commented 5 months ago

@jcfr is this PR ready for review/merge?

jcfr commented 5 months ago

This is ready for final review, support for bumping the version using nox -s bump will be added in a follow-up pull request.

It should be integrated using Squash & Merge specifying the following commit message:

feat: Add support for generating and bundling index files during wheel build

This commit introduces a helper Python script to facilitate the generation of
index files, namely `idc_index.csv.zip` and `idc_index.parquet`. Additionally,
it updates the build system to incorporate these index files into the generated
wheel package.

Update the build system to include new CMake options:
`IDC_INDEX_DATA_GENERATE_CSV_ARCHIVE` and `IDC_INDEX_DATA_GENERATE_PARQUET`,
allowing to specify whether to generate CSV archives and Parquet files respectively.
Currently, only the compressed CSV index file is generated and bundled.

Enhance the GitHub workflows by adding a step for authenticating to the Google
Cloud Platform (GCP) using the secret variable `SERVICE_ACCOUNT_KEY` associated
with the repository.

Remove support for `pypi-3.10` due to unresponsive runners while running
the command `python -m pip install .[test]`.

Adjust the `setuptools_scm` version scheme to maintain the development package version
consistent with the table version. Instead of setting the development package version
to the next one, set it by appending `.post1.devN`, where `N` represents the number of
commits following the most recent tag.

Co-authored-by: Andrey Fedorov <andrey.fedorov@gmail.com>
Co-authored-by: Jean-Christophe Fillion-Robin <jchris.fillionr@kitware.com>

cc: @fedorov @vkt1414

vkt1414 commented 5 months ago

Thank you very much @jcfr ! I learned a lot as this PR evolved.

Only thing is I do not know if the boolean value for parquet is passed along to the index manager properly, as it seem to generate both csv and parquet although the latter is disabled in cmakelists.

jcfr commented 5 months ago

This last commit should fix the following ...

image

jcfr commented 5 months ago

There is one more tweak I would like to add and will take care of this later .. have to head out now :rocket:

jcfr commented 5 months ago

Moving forward with the integration, additional improvements and fixes will be integrated in follow-up pull requests.

jcfr commented 5 months ago

Unclear why I get the message The base branch does not allow updates, I updated a GitHub discussion^1 where this was mentioned and will move forward with the integration bypassing the checks.

Status of the pull request before integration

image