ProjectPythia / xbatcher-ML-1-cookbook

https://projectpythia.org/xbatcher-ML-1-cookbook/
Apache License 2.0
3 stars 5 forks source link

Cookbook currently not building #5

Open r-ford opened 1 year ago

r-ford commented 1 year ago

There are a few issues keeping this cookbook from building properly in the gallery, which can be seen in the build logs. To summarize, there is an error running the following cell in surface_currents.ipynb:

%run ./surface_currents_prep.ipynb

Exception: File `'./surface_currents_prep.ipynb.py'` not found.

And in surface_currents_model.ipynb:

NameError                                 Traceback (most recent call last)
/tmp/ipykernel_159/99668342.py in <module>
----> 1 class LossHistory(tf.keras.callbacks.Callback):
      2     def __init__(self, frequency=30):
      3         self.frequency = frequency
      4         self.batch_counter = 0
      5 

NameError: name 'tf' is not defined

I noticed three copies of the main surface_currents.ipynb notebook, one in the root of the repo, one in /notebooks, and one in /notebooks/images. The Cookbook Actions are trying to run all of them, so that might be part of the issue.

There might be a problem with binderbot and %run, but I'm not familiar enough with either. There is some more discussion at #4 about data access.

cmdupuis3 commented 1 year ago

I noticed three copies of the main surface_currents.ipynb notebook, one in the root of the repo, one in /notebooks, and one in /notebooks/images. The Cookbook Actions are trying to run all of them, so that might be part of the issue.

There definitely shouldn't be multiple copies, that's probably a mistake on my end. The correct versions are in /notebooks.

There might be a problem with binderbot and %run, but I'm not familiar enough with either. There is some more discussion at #4 about data access.

I think the rest of it has something to do with %run. Basically, the other two notebooks are just there so the functionality doesn't need to be inlined. It's quite a lot of code that's not really relevant to this Pythia cookbook, but I wanted to include them as notebooks for annotation purposes if anyone was curious.

I think the problem is that the Pythia workflow is trying to run each notebook independently, but they're not designed for that. I could kludge it by duplicating my Tensorflow imports in the prep notebook, but I think that's glossing over the bigger issue.

cmdupuis3 commented 1 year ago

I've also been getting this thing, which is a mystery to me:

Exception: File `'./surface_currents_prep.ipynb.py'` not found.
mgrover1 commented 1 year ago

It's running it from the repo root directory, so you will need to add notebooks/ to that path

cmdupuis3 commented 1 year ago

You mean %run ./notebooks/surface_currents_prep.ipynb right? I removed the one in the root directory though, it wasn't supposed to be there.

Also, why is .py being appended here?

r-ford commented 1 year ago

I just opened a PR that runs the notebook on GitHub instead of BinderHub. You can see the preview here. It looks like it was able to run the other notebook fine, but it just got caught on the data access (#4).

cmdupuis3 commented 1 year ago

Sounds good. Is there a standard place to put data for cookbooks? I can add a subset of the original, I think it comes out to about a gig total.

r-ford commented 1 year ago

I think 1 GB would probably be a problem, but at least 2 cookbooks do use some data stored in a folder in the repo (Gridding and Landsat ML, <50 MB). Is it possible that the CMIP6 or CESM LENS data could be used to show your workflow? Another option is to just display the pre-executed notebook while the data access is being figured out.

brian-rose commented 1 year ago

A related discussion about data storage for cookbooks is here: https://github.com/ProjectPythia/kerchunk-cookbook/issues/20

cmdupuis3 commented 1 year ago

I asked Ryan about this and for this particular case, he said we could make it host-pays, but I'm not sure what the current status is.