NOAA-OWP / lstm

Other
10 stars 14 forks source link

GH Actions Fail for py v3.7 #40

Closed madMatchstick closed 3 months ago

madMatchstick commented 9 months ago

Manually ran unit testing workflow and python version 3.7 fails for both mac and linux os. See job details here.

Reminder to look into.

SnowHydrology commented 7 months ago

I ran the workflow again and it still failed. https://github.com/NOAA-OWP/lstm/actions/runs/7830614382

@madMatchstick, It looks like the Python 3.7 workflow is accessing a different file for netCDF4 versus 3.8 and 3.9, even though it's using the same version of the package.

3.7 file: Downloading netCDF4-1.6.5.tar.gz

3.8 file: Downloading netCDF4-1.6.5-cp38-cp38-macosx_10_9_x86_64.whl.metadata

Any idea why 3.7 would download a tarball versus 3.8 and 3.9 accessing a whl.metadata file? I don't know enough about Python to answer the question. Also tagging @jmframe, @peckhams, and @ajkhattak

madMatchstick commented 6 months ago

@SnowHydrology - A few things,

  1. I too have no idea why python v3.7 is pulling a tarball vs building wheel, nor do I know enough about python to answer this question :). It's yelling about No package 'hdf5' found which I suppose we could try installing directly.
  2. After some stack overflow trolling, this can happen with the use of actions/setup-python@v<*> which is where the ${{ matrix.python-version }} currently lives.
  3. Know that the environment.yml specifically calls out python v3.7. I believe this has been the case from our lstm_bmi genesis.
  4. When moving the version matrix to the conda-incubator/setup-miniconda@v<*> block the workflow runs for v3.7 but fails for all other because of dependency python=3.7 mentioned in (3). Removing this version dependency from the env file allows for all python versions (3.7 - 3.12) to run okay. Where as before (via actions/setup-python@v<*>), versions 3.9+ would break, as noted in comment.
  5. I updated some of the "reusable workflow" versions (e.g. uses: actions/checkout@v4, etc.) but don't think it's relevant to this issue. Although we should change it to point to the latest, regardless.

See actions in my debug branch.

So, I suppose the question is, do we want to keep python pinned to v3.7? For standalone local builds, gh workflows, ngen integration (the dependancies doc list python at version > 3.6.8), etc.? Tagging @jmframe & @hellkite500.

Thanks!

stcui007 commented 6 months ago

The Python version info probably needs to be changed. see this page: https://github.com/NOAA-OWP/ngen/wiki/Building. On UCS machines, we use 3.8.18. t-route requires 3.9. That probably is going to become a standard soon. Nels can answer this more authoritatively.

On Thu, Feb 29, 2024 at 1:19 PM JessicaGarrett-NOAA < @.***> wrote:

@SnowHydrology https://github.com/SnowHydrology - A few things,

  1. I too have no idea why python v3.7 is pulling a tarball vs building wheel, nor do I know enough about python to answer this question :). It's yelling about No package 'hdf5' found which I suppose we could try installing directly.
  2. After some stack overflow trolling, this can happen with the use of @.**<> which is where the ${{ matrix.python-version }} currently lives.
  3. Know that the environment.yml https://github.com/NOAA-OWP/lstm/blob/8d33cc9d0900984833ab0f311e932c9cc595e510/environment.yml#L15 specifically calls out python v3.7. I believe this has been the case from our lstm_bmi genesis.
  4. When moving the version matrix to the @.<> block the workflow runs for v3.7 but fails for all other because of dependency python=3.7 mentioned in (3). Removing this version dependency from the env file allows for all python versions (3.7 - 3.12) to run okay. Where as before (via **@.**<>), versions 3.9+ would break, as noted in comment https://github.com/NOAA-OWP/lstm/blob/8d33cc9d0900984833ab0f311e932c9cc595e510/.github/workflows/test_unit_standalone.yml#L26 .
  5. I updated some of the "reusable workflow" versions (e.g. uses: @.***, etc.) but don't think it's relevant to this issue. Although we should change it to point to the latest, regardless.

See actions https://github.com/NOAA-OWP/lstm/actions/runs/8100995217 in my debug branch.

So, I suppose the question is, do we want to keep python pinned to v3.7? For standalone local builds, gh workflows, ngen integration (the dependancies doc list python at version > 3.6.8), etc.? Tagging @jmframe https://github.com/jmframe & @hellkite500 https://github.com/hellkite500.

Thanks!

— Reply to this email directly, view it on GitHub https://github.com/NOAA-OWP/lstm/issues/40#issuecomment-1971804138, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACA4SRO35JAIP7MMYLZUDRLYV57KHAVCNFSM6AAAAAA74H2B7GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZRHAYDIMJTHA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

hellkite500 commented 6 months ago

Python 3.7 is officially EoL (https://devguide.python.org/versions/). I wouldn't invest energy into maintaining tests with it, just remove it from the matrix and start testing on a newer version instead.

madMatchstick commented 6 months ago

I am clearly a bit behind the times. That settles it then. Thanks @hellkite500 & @stcui007!

peckhams commented 6 months ago

I think the issue of pinning to Python 3.7 may have been because LSTM was using some PKL (pickle) files, and these aren't guaranteed to work with future versions of Python. Perhaps that use of PKL files should be reconsidered. Best, Scott

On Fri, Mar 1, 2024 at 9:11 AM JessicaGarrett-NOAA @.***> wrote:

I am clearly a bit behind the times. That settles it then. Thanks @hellkite500 https://github.com/hellkite500 & @stcui007 https://github.com/stcui007!

— Reply to this email directly, view it on GitHub https://github.com/NOAA-OWP/lstm/issues/40#issuecomment-1973464289, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB2T4L6H5B2F3OFG6DY2DTLYWCSBHAVCNFSM6AAAAAA74H2B7GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZTGQ3DIMRYHE . You are receiving this because you were mentioned.Message ID: @.***>

madMatchstick commented 6 months ago

I think the issue of pinning to Python 3.7 may have been because LSTM was using some PKL (pickle) files, and these aren't guaranteed to work with future versions of Python. Perhaps that use of PKL files should be reconsidered. Best, Scott

@peckhams - Thanks for bringing this up. I know the serialization work was heavily based on pickle. I'll have to test the lstm_serialization_test.py on later versions of python. For now, this workflow only considers the "standalone" version and the bmi unit testing script. But to your point we may have to revisit the use pickle for loading and saving model state, etc. ... -thanks!

peckhams commented 6 months ago

@@. @.> - I wasn't thinking about the serialization work, but that is also the case. Jonathan had saved some training data or other required data in a PKL file as well, which imposed some version dependencies. Best, Scott

On Tue, Mar 5, 2024 at 9:23 AM JessicaGarrett-NOAA @.***> wrote:

I think the issue of pinning to Python 3.7 may have been because LSTM was using some PKL (pickle) files, and these aren't guaranteed to work with future versions of Python. Perhaps that use of PKL files should be reconsidered. Best, Scott

@peckhams https://github.com/peckhams - Thanks for bringing this up. I know the serialization work was heavily based on pickle. I'll have to test the lstm_serialization_test.py https://github.com/NOAA-OWP/lstm/blob/master/lstm/lstm_serialization_test.py on later versions of python. For now, this workflow https://github.com/NOAA-OWP/lstm/blob/master/.github/workflows/test_unit_standalone.yml only considers the "standalone" version and the bmi unit testing script. But to your point we may have to revisit the use pickle for loading and saving model state, etc. ... -thanks!

— Reply to this email directly, view it on GitHub https://github.com/NOAA-OWP/lstm/issues/40#issuecomment-1979154174, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB2T4LYBOD2EVEUT7ABDC63YWXWOTAVCNFSM6AAAAAA74H2B7GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZZGE2TIMJXGQ . You are receiving this because you were mentioned.Message ID: @.***>

madMatchstick commented 6 months ago

Also - I just realized that neuralhydrology requires python >= v3.10 and relies strongly on pickle to save and load files (e.g. here). Looking at the latest 3.12.2 docs too. Thinking we should be okay?

jmframe commented 6 months ago

But the training done in NeuralHydrology shouldn't impact how the model is run in NextGen, right? So even if there are dependency issues and formatting issues from the NeuralHydrology training, the files that are used for the LSTM module in NextGen don't use them, right?

madMatchstick commented 6 months ago

But the training done in NeuralHydrology shouldn't impact how the model is run in NextGen, right? So even if there are dependency issues and formatting issues from the NeuralHydrology training(, the files that are used for the LSTM module in NextGen don't use them, right?

Correct, good clarification. Perhaps a poorly placed reference on my part. All I was getting at was that pickle should work fine with later (3.7+) python versions :)
-thanks!

SnowHydrology commented 6 months ago

@madMatchstick, to confirm, we can remove Python 3.7 tests?

madMatchstick commented 6 months ago

@madMatchstick, to confirm, we can remove Python 3.7 tests?

Correct. And update the env.yml to something like python=3.10. I'll cleanup my debug branch and submit pr. -thanks