Consider removing support for stateful layers/models

Dobiasd commented 3 months ago

With recent advances (transformer models, attention layers, etc.), to me, it seems like the NLP community is moving away from LSTM, GRU (and Bidirectional).

The latest TensorFlow update (from 2.15 to 2.16) introduced many changes to these things regarding the handling of states, and it's not trivial to make these parts of frugally-deep work again with the latest TensorFlow version.

Some things look like TF bugs:

But other things are just changes, e.g.:

LSTM output values when using an initial_state call argument
model.reset_states no longer exists

So, this GitHub issue mainly intends to collect contributors' thoughts.

@hunyadi and @keithchugg: You contributed many amazing things in this area to frugally-deep. What's your opinion on the matter? Is it worth investing the effort to maintain these things, or did they go out of fashion and we could drop support?

hunyadi commented 3 months ago

I don't have hard feelings for or against. As the owner of the project, you can make the best judgment whether you have the capacity to keep up with these changes in TensorFlow, or drop support for these features in favor of the more popular models. I don't think supporting everything in TensorFlow was ever a goal for this project, and in my opinion it's OK to focus on wherever the project is delivering most value.

Dobiasd commented 3 months ago

Thanks! Yeah, it's hard for me to keep up with TensorFlow. Now especially on the recurrent and stateful features. So I see these options:

A: Stop supporting new TensorFlow versions.
B: Stop supporting recurrent/stateful layers/models.
C: Get help from somebody to regularly make sure the recurrent/stateful things work with the new TensorFlow versions.

A would mean to sunset frugally-deep, which I would like to avoid.

C seems very hard, and I assume you're also not interested in this kind of maintenance work, right? :D

So I currently lean towards option B.

Dobiasd commented 3 months ago

@n-Guard Do you have any thoughts on this?

n-Guard commented 3 months ago

Hi Tobias,

first of all I would like to thank you again for this amazing project that you are maintaining with such persistence! Considering my field and use case, there is still need for these recurrent layers so I would appreciate if fdeep would support them in the future.

I can offer some help but I have to first dive deeper into the new Keras v3 API, that is used by Tensorflow 2.16. I also think there is a greater value if we can make the bigger switch to Keras 3, since it also supports PyTorch and JAX as a backend but the Keras model itself should be backend-agnostic.

My hope would be that once we make the changes to support the new API, future maintenance should not be constantly that high (at least for the forseeable future).

Regarding the Bidirectional layer problem: I hope that it will be fixed soon, this issue also seems to be related: https://github.com/keras-team/keras/issues/18913

Dobiasd commented 3 months ago

Thanks a lot (for the heartwarming feedback and the offer to help :heart:)!

Do you think the following approach could work for us?

Update frugally-deep to TF 2.16, i.e., adjust to the new internal model architecture (node connections, etc.), removing the recurrent layers, so things work again. (my part)
Then later, re-introduce the previously removed recurrent functionality. (your part, whenever is a good time for you)

This would mean, we'd temporarily have a frugally-deep version with reduced layer support in the main branch, and you'd have to checkout an older commit (for example the current tag) to get the recurrent code to adjust. But it would unblock us, i.e., make our two jobs independent of each other.

n-Guard commented 3 months ago

Yes, sounds good to me! As soon as I can allocate some time, I'll let you know!

Dobiasd commented 3 months ago

Support for Bidirectional, GRU, LSTM, and stateful models have been removed in the latest version.

In case I can help in some with the re-introduction, just let me know. :slightly_smiling_face:

stellarpower commented 2 months ago

@Dobiasd I guess the decision is already made, but if you're interested in feedback - NLP may have basically dropped RNNs in favour of Transformers, but, AFAIK these don't work so well on time series, which is what I'm working on myself. For now obviously I can keep my checkout at the same version it's always been on, but without support for retaining state, my model would be of limited use (I am filtering signals in realtime, so I feed in ideally one, or at most, eight samples at a time, and it is very important for the model to keep its state or it won't remember anything it has seen), and without LSTM support at all then it would make Frugally-Deep essentially useless for me, as I don't think any of the other architectures applies itself so well to the problem. Which is a shame as I've generally found it good to use and am not sure what else I could replace it with beyond trying to include all of Tensorflow or writing something from scratch.

Dobiasd commented 2 months ago

Ah, thanks for the feedback/motivation. Good to know! I'll keep it on my to-maybe-do list. Perhaps with the next TensorFlow/Keras version, these things will no longer that crazy and buggy. :crossed_fingers:

stellarpower commented 2 months ago

Yeah, I definitely think it's a fair assessment - after six months my models aren't really learning, and when I started out I wouldn't have assumed there to be significant bugs in Keras or TF affecting the core training. But now, I'm not so sure. At the least marking the layers as unstable and not entirely supported pending upstream fixes I think would be reasonable.

Dobiasd / frugally-deep

Consider removing support for stateful layers/models #417