ImperialCollegeLondon / SWMManywhere

SWMManywhere is used to derive and simulate a sewer network anywhere in the world
https://imperialcollegelondon.github.io/SWMManywhere/
BSD 3-Clause "New" or "Revised" License
2 stars 0 forks source link

:green_heart: Update publish workflow to deploy in PyPI #282

Closed dalonsoa closed 5 days ago

dalonsoa commented 1 week ago

Description

Setup the publish workflow to deploy SWMManywhere in PyPI whenever there is a new release - with a new tag, important!

Essentially, this:

Obviously, this cannot be tested until we merge it and try to publish it, but it is exactly the same workflow used in WSIMOD, which does work.

Fixes #261

Type of change

Key checklist

N/A

Further checks

N/A

cheginit commented 1 week ago

A couple of suggestions:

For my suggestions regarding Hatch you can refer to this file and for the publishing workflow you can refer to this file

dalonsoa commented 1 week ago
  • Since the repo will be public, it's a good idea to add artifact attestation.

Never heard of it, but sounds like a good idea.

  • Switch to using the modern successor of setuptools, hatchling (for why this is a good idea, I refer to their docs). The current pyproject.toml, most probably, requires some changes to work correctly. If needed, I can give the extra options that should be added for setuptools. Changing to hatchling requires just changing a few lines.

Switching to Hatch might be a good idea or not - have not read the docs, yet - but that is besides the point of this PR. If we switch to Hatch it should be its own separate PR to modernize the tooling. To be clear, I'm not against switching, but that's a separate issue.

  • Dependencies require extra attention and adding some min versions, e.g., pywbt>=0.2, geopandas>=1, and shapely>=2. For example, I am not sure if cdsapi, fiona (pyogrio will be installed with geopandas), fastparquet (you already have pyarrow), GitPython, and pyflwdir, are needed.

No idea about these. @barneydobson ?

Also, although netcomp is owned by Barnaby, it's generally not a good idea to have a not-published-on-PyPI dependency. Since its license is MIT, the parts of netcomp that are needed could just be copied over here as a new module, with credits to the original developer.

I hadn't realised that netcomp is not in PyPI. PyPI does not allow GitHub dependencies so the deployment would have failed, anyway. Thanks for pointing this! So, what's the best approach with that one?

  • Add a URL section.

OK

Cool! Didn't know about this one.

  • It's a good idea to have a separate publish workflow that only runs on tagged pushes, so tags on the repo match the Python versions. This way you don't need a dependency on the CI action, since you won't release a tag, unless it's tested thoroughly.

I disagree with this one. Creating a Release that also creates a tag and runs the full suite of tests before publishing to catch last minute errors is preferable than just pushing tags. It has less manual steps, is foolproof and facilitates keeping in sync releases, tags and PyPI.

  • For documentation, it seems that you're planning to use GitHub pages, but there's no versioning, e.g. using mike. Although not necessary, a versioned documentation is very useful.

I've never found that useful to keep old versions of the documentation alive, but fine. Is not a big effort. Having said that, @barneydobson if you prefer to publish things in Read The Docs, that's also fine, but since WSIMOD is in GitHub Pages, I thought on doing the same.

Also, I think you need to add site_url to mkdocs.yaml.

Ok

dalonsoa commented 1 week ago
  • Switch to using the modern successor of setuptools, hatchling (for why this is a good idea, I refer to their docs). The current pyproject.toml, most probably, requires some changes to work correctly. If needed, I can give the extra options that should be added for setuptools. Changing to hatchling requires just changing a few lines.

Sorry, you were talking about hatchling, not adopting Hatch in full. Yes, it seems like a good move and part of this PR.

dalonsoa commented 1 week ago

Trying to use hatching and build the wheel also has flagged netcomp as an issue. It can be bypassed with the right flag, but PyPI will not accept that, so we go back to the beginning: netcomp cannot be there.

cheginit commented 1 week ago

So, what's the best approach with that one?

@dalonsoa Since netcomp's license is MIT, the parts of it that are needed could just be copied over here as a new module, with credits to the original developer.

dalonsoa commented 1 week ago

It feels a bit hacky... but I guess it would do, if that tool is absolutely necessary.

cheginit commented 1 week ago

@dalonsoa netcomp is not maintained (last commit is for 7 years ago), so I don't think there's any other option.

dalonsoa commented 1 week ago

OK, I've push a few changes after @cheginit suggestions, but there are a few things pending. I feel some of these should be done a separate PR:

cheginit commented 1 week ago

@dalonsoa I don't think your current setup of adding ci.yml to release will work, unless you add workflow_call: to ci.yml to make it callable from other workflows.

barneydobson commented 1 week ago

sorry guys I'm away this week but it seems you are ironing out all the issues in here.

RE netcomp I think @cheginit 's suggestion of copying it over is fine - it is not core to the software so I could also add it to the paper repository if you think that is tidier.

dalonsoa commented 1 week ago

@dalonsoa I don't think your current setup of adding ci.yml to release will work, unless you add workflow_call: to ci.yml to make it callable from other workflows.

It already had workflow_call. See https://github.com/ImperialCollegeLondon/SWMManywhere/blob/f3f1fcbcbb94f42125cfefe721e0944da1ae0560/.github/workflows/ci.yml#L3

dalonsoa commented 1 week ago

I've checked all the dependencies and only fiona, fastparquet and GitPython are not required.

dalonsoa commented 1 week ago

OK, one step closer.

I've added netcomp to swmmanywhere itself, hopefully in the right way for it to be available when deployed to PyPI. I've disabled running pre-commit on it because it was just failing in everything. That will need to be fixed at some point... but not today. And it is not necessary before release.

I've also removed unused dependencies. Only three in the end.

I had to pin pywbt<=0.1.1 because the newest version, from a few hours ago, was breaking things. @cheginit , I believe you maintain pywbt, so it would be good if you could have a look at what needs changing.

Still pending adding a lower version limit for some dependencies.

dalonsoa commented 1 week ago

Seems like the pre-commit keeps trying to run in netcomp. Well, let's see how we can avoid that.

dalonsoa commented 1 week ago

I had to pin pywbt<=0.1.1 because the newest version, from a few hours ago, was breaking things. @cheginit , I believe you maintain pywbt, so it would be good if you could have a look at what needs changing.

Ok, I see there's a whole PR about this, so I'll leave that one to be merged first and then I'll remove the pinning in this one.

dalonsoa commented 1 week ago

OK, now there's an integer overflow error. I've no idea where that is coming from or why, so I'll leave this PR on hold until @barneydobson is back.

Things pending:

cheginit commented 1 week ago

@dalonsoa The issue with the overflow is due to using fiona as the engine. I'm guessing the new version of fiona which was released recently made some changes that cannot parse long integers. The overflow issue is due to the id column of the subs_rc which is an integer.

However, if you change the engine to pyogrio, it won't fail. More importantly, I am not sure why geopandas is pinned at 0.14, they released v1, back in June, so I highly recommend pining it to '>=1'. The default engine of geopandas is now pyogrio.

I just updated geopandas to their latest version 1.0.1 and all tests passed without changing anything.

cheginit commented 1 week ago

Give lower limit to dependencies. I don't feel it is necessary unless we know already some lower versions that will not work.

@dalonsoa a good reason for pinning lower limits is for maintenance, especially geospatial-related packages, can be hassle to debug (the overflow issue being an example), so I think some important packages such as geopandas, rasterio, and xarray need to be pinned to a stable release just to avoid hard to debug issues once this repo becomes public.

Regarding the pywbt issue, I will take a look and suggest a solution.

dalonsoa commented 1 week ago

Yeah, I totally agree with you. My point is that I do not know what to pin and to what minimum version. You spotted the issue with geopandas, so now we can pin that one. But I cannot arbitrary pin things without more information.

dalonsoa commented 1 week ago

@dalonsoa The issue with the overflow is due to using fiona as the engine. I'm guessing the new version of fiona which was released recently made some changes that cannot parse long integers. The overflow issue is due to the id column of the subs_rc which is an integer.

However, if you change the engine to pyogrio, it won't fail. More importantly, I am not sure why geopandas is pinned at 0.14, they released v1, back in June, so I highly recommend pining it to '>=1'. The default engine of geopandas is now pyogrio.

I just updated geopandas to their latest version 1.0.1 and all tests passed without changing anything.

It seems we cannot use geopandas>=1 because the last stable version of osmnx, 1.9.4, requires an older one. See https://github.com/gboeing/osmnx/blob/715d704b4b93bd29a4d6782923194f4f734658b2/pyproject.toml#L28

cheginit commented 1 week ago

@dalonsoa Ah, ok. Yeah, makes sense.

For the pywbt issue, you just need to replace the code at https://github.com/ImperialCollegeLondon/SWMManywhere/blob/45f9bd1804d8cf654d6ed74de2366bce0b0167fa/swmmanywhere/geospatial_utilities.py#L644-L650 with this:

        whitebox_tools(
            temp_path,
            wbt_args,
            save_dir=temp_path,
            verbose=verbose(),
            wbt_root=temp_path / "WBT",
            zip_path=fid.parent / "whiteboxtools_binaries.zip",
            max_procs=1,
        )

You can also add whiteboxtools_binaries.zip to .gitignore.

You also need to replace all instances of np.Inf with np.inf since np.Inf has been deprecated in numpy. They were the same any way.

I noticed that there's a deprecation warning in tests. This is due to using fillna on a full dataframe, which is not a good idea anyway, so the code at https://github.com/ImperialCollegeLondon/SWMManywhere/blob/45f9bd1804d8cf654d6ed74de2366bce0b0167fa/swmmanywhere/post_processing.py#L436 should be replaced with:

        numeric_cols = shp.select_dtypes(include=[np.number]).columns
        shp[numeric_cols] = shp[numeric_cols].fillna(0)
cheginit commented 1 week ago

It seems we cannot use geopandas>=1 because the last stable version of osmnx, 1.9.4, requires an older one. See gboeing/osmnx@715d704/pyproject.toml#L28

Ah, I see. It seems to be in preparation for the next release since in the main branch, geopandas is pinned at >=1. It seems that 1.9.3 works fine though, since I pinned geopandas>=1 and created a new env by removing the req file (which resulted in installing osmnx=1.9.3) and all tests passed without any issue. I guess we can just pin geopandas>=1 and osmnx>=1.9.3, just to be safe.

dalonsoa commented 1 week ago

If we do that, pip-compile, used to create the various requirements.txt files will complain since there will be conflicting requirements. geopandas>=1 and osmnx>=1.9.3 might work perfectly fine together, but from the point of view of the dependencies, both are incompatible.

cheginit commented 1 week ago

@dalonsoa I see what you mean. Although in my opinion, the trade-off is fine, a workaround is to add pyogrio as dependency and then throughout the code base, wherever there is gpd.to_file add the engine="pyogrio" argument. If there are many such instances, you can just create a partial function once and use it in the codebase.

dalonsoa commented 1 week ago

Ok, this is now working. I had to ditch the requirements files because they could not be produced having incompatible dependency requirements, and update the CI files where relevant.

Anything else you think should be pinned? I think I've fixed all your comments in https://github.com/ImperialCollegeLondon/SWMManywhere/pull/282#issuecomment-2361210862

cheginit commented 1 week ago

@dalonsoa Awesome! No, I think the rest of deps are fine. You can also remove [tool.setuptools.packages.find] and maybe add more classifiers, for example:

classifiers = [
  "Development Status :: 4 - Beta",
  "Intended Audience :: Science/Research",
  "License :: OSI Approved :: MIT License",
  "Operating System :: OS Independent",
  "Programming Language :: Python",
  "Programming Language :: Python :: 3 :: Only",
  "Programming Language :: Python :: 3.9",
  "Programming Language :: Python :: 3.10",
  "Programming Language :: Python :: 3.11",
  "Programming Language :: Python :: 3.12",
  "Topic :: Scientific/Engineering",
  "Topic :: Scientific/Engineering :: GIS",
  "Typing :: Typed",
]

Regarding the versioning, if you're planning to release tags, you can use hatchling-vcs. Then, you just need to remove the version field and add:

requires = [
  "hatch-vcs",
  "hatchling",
]

dynamic = [
  "version",
]

Then, in __init__.py add:

from importlib.metadata import PackageNotFoundError, version

try:
    __version__ = version("swmmanywhere")
except PackageNotFoundError:
    __version__ = "999"
dalonsoa commented 1 week ago

Done. I've removed Python 3.9 from the classifiers as I don't think we are supporting it. I haven't check, though.

Ideally, before a Release we should probably run the tests in a full matrix with all the OS and Python versions that are meant to be supported, but maybe we can leave that for a future improvement.

dalonsoa commented 1 week ago

I've updated the workflows such that:

This should give us more confidence that we are deploying something actually tested in the platforms and with the python versions we say this works without any burden for the normal development.

cheginit commented 1 week ago

I've removed Python 3.9 from the classifiers as I don't think we are supporting it.

@dalonsoa Right, it makes sense to set the min to 3.10. Many major packages have already dropped support for 3.9 since 3.13 is on the horizon.

The workflow setup is neat! I like it.

cheginit commented 1 week ago

@dalonsoa Another thing that I noticed is that pytest is set to use the default prepend mode for import. According to the best practices in pytest's docunmentation, new projects should set "--import-mode=importlib". If for some reason you prefer the prepend method, the src-layout should be used as opposed to the project-layout.

Also, I noticed that -p no:warnings is there too. It's generally not a good idea to suppress all warnings if they're not being handled by some other machanism. I'd just supress specific warnings that are beyond the scope of the project after seeing them in pytest outputs, using:

[pytest]
filterwarnings =
    ignore:.*message.*
barneydobson commented 1 week ago

@cheginit @dalonsoa are you both fine to resolve these comments between yourselves and then I can review? I can't say I am following everything in this thread but all seems to be going Ok!

cheginit commented 1 week ago

The PR looks good to me, and I don't have any other comments.

dalonsoa commented 1 week ago

I'll tackle the last two remaining comments as soon as possible and then we can call it a day.

dalonsoa commented 6 days ago

OK, all done!!

now that we are not supressing the warnings, as we should have been doing from the beginning, we get 812 warnings when running the test, of all kinds. We should have a look at some point in case there's some low hanging fruit or something really worrying, but that is beyond the scope of this PR.

From my point of view, we are ready to move forward.

cheginit commented 6 days ago

@dalonsoa well done!

I skimmed the warnings, the only one that could be in the scope of this PR is adding downloads mark in pytest settings and no need to rerun the checks (e.g., adding [skip ci] to the commit message). There are a couple of low-hanging fruits too, but I agree that they should be addressed in another PR.

dalonsoa commented 6 days ago

Done!

barneydobson commented 5 days ago

@dalonsoa should I set the repo visibility to public and then merge and this does the rest? or is there something else needed

dalonsoa commented 5 days ago

Both things are necessary... and then create a Release in the repo main page. We can do that together at 10 am or so, if you want to see how it goes and be able of doing it in the future. And cross your fingers!

Having said that, are we waiting to create a release until we have the conda-forge equivalent workflow? @cheginit

barneydobson commented 5 days ago

I think @cheginit suggested we do this first - iron out any issues with the documentation etc, and then do it on conda-forge

barneydobson commented 5 days ago

also @dalonsoa I'm in a meeting until 10 - but I can Teams call you when back at desk?

dalonsoa commented 5 days ago

Sounds good!