arvoelke / cosyne2018

Recurrently compressing a rolling time window using spiking neurons
0 stars 1 forks source link

Rejected Cosyne Reviews #1

Closed arvoelke closed 4 years ago

arvoelke commented 6 years ago

Our submission was rejected from Cosyne this year. For sake of transparency, rebuttal, and future reference, I have included our review notice below:


The review process for Cosyne 2018 is now complete. We regret to inform you that your submission #218:

"Recurrently compressing a rolling time window using spiking neurons"

could not be accepted for presentation. This year, we received a record number of submissions (704). Due to space constraints in the conference site, only a subset of submissions could be accepted. Unfortunately, because of these constraints and the high volume of submissions, we were unable to accept a large number of extremely competitive abstracts.

You will find the reviews at the bottom of this email. Please keep in mind that the very high threshold for acceptance means that many high quality submissions have had to be rejected due to space limitations, and not due to specific criticisms of the reviewers. However we hope that you will find the feedback useful, particularly for preparing future Cosyne submissions.

For full transparency, the selection process was as follows: Three reviewers (from a pool of 197) independently scored each of the 704 submitted abstracts. The reviewer pool was selected by the Cosyne 2018 program committee, and submissions were assigned to specific reviewers based on an automated process that matched abstracts to reviewer expertise (software by Daniel Acuna and Konrad Kording). Submissions were then ranked according to their average reviewer score. The top scoring 56 % of submissions were accepted, which resulted in acceptance of submissions scoring above 5.67 (out of 10).

We sincerely regret that we could not accept your submission, and we fully recognize that this process is inherently noisy and imperfect. We wish you luck with future Cosyne submissions.

======================= Reviewer Comments =======================

arvoelke commented 6 years ago

Our rebuttals may be found below. Overall we strongly believe that reviewers 1 and 3 did not make a sufficient effort to read the abstract or its references in depth. Likewise, all three reviewers appeared to be unfamiliar with the status of the field in regards to the general (in)ability of neural networks to efficiently perform the same computations as our delay network.


Reviewer 1

This method can be used as an alternative to Echo-State machines, in order to reduce dimensionality over time...

No. We are not reducing dimensionality over time. We are representing a rolling window of the input, over time, in a space with reduced dimensionality relative to the number of time-steps, but more dimensions than there are input dimensions (q for each input dimension).

as it sees to outperform Echo-States, at least in low frequency regime

No. This is actually the opposite of what we found, as stated explicitly in the last paragraph of our abstract: "Reducing the input frequency to 15 Hz improves the ESN’s accuracy to be on par with the non-spiking DN, and thus we attribute this difference to the inherent difficulty of autocorrelating a high-frequency signal (relative to θ) using random feedback weights, as opposed to using optimally derived weights as in the DN.".

In other words, we outperform ESNs at higher frequencies. When the input frequency is low, we both perform about the same, since the input does not vary greatly across the window. Our spiking delay network also greatly outperforms the spiking version of ESNs, known as Liquid State Machines (LSMs), at all frequencies -- however this latter result was not included in the abstract.

However, it has more of assumptions and fine tuning compared to the random network

This is simply false. In fact, we found that ESNs required far more fine-tuning, according to our use of hyperopt plus the 1,000 simulations needed to fine-tune its hyperparameters (tau, input gain, recurrent gain, regularization) in order to reach an adequate level of performance on the more difficult (high-frequency) task. This hyper-parameter optimization was performed purely for the benefit of the ESN. Our network does not require any fine-tuning of the hyperparameters. We also use the defaults in Nengo.

The DN only requires knowledge of the desired delay (θ), which was fixed at 0.1s and left untouched for all experiments / simulations. This is the only additional assumption made by our network. ESNs essentially roll this assumption into the fine-tuning of tau, while we decouple these two time-constants (our approach works with any tau).

which makes it more of a signal processing tool, than a biological model of neuronal dynamics.

Again, this is false. Our network is both a model and a tool. For comparison, consider a neural integrator modelling working memory. It is a signal processing tool, because you can use it to perform a well-defined dynamical computation within a neural network, yet it is also a model, because there are numerous articles using this to explain/predict neural/behavioural data from working memory experiments.

In our case, the DN is a model because we are modelling how time may be represented in neural systems; see our referenced paper published in Neural Computation for a continuation of this point, including supporting evidence that the activity of our network is similar to the responses of "time cells" recorded in rodents during a delay task. Yet, it is also a tool because we can analyze the class of functions that are accurately supported by the nonlinear temporal representation (see Figures 1 and 3 and equations 2 and 3 from abstract), and then use this understanding to construct larger networks that efficiently perform useful dynamical computations.


Reviewer 2

The authors design a neural network to represent a rolling time window of input in a low-dimensional space: they use the Neural Engineering Framework, which involves (this is what I am getting from the abstract) deriving a filter that will get the desired function, and then simulate a network of spiking neurons that approximates that filter.

This is true.

Their network performs favorably compared to an echo state network which has to be trained using input signals. (This doesn't seem surprising to me, because they get to derive their weights offline)

Note that ESNs also train their readout weights offline (using least squares). That said, for both our approach and for ESNs, the readout weights may be trained either online (using recursive least squares) or offline (using least squares). However, this is not an important distinction here, since both training methods can achieve the same results.

One major point is that ESNs don't know how to choose recurrent weights that are any better than random, while we show how to do so. This is a powerful way to improve the ESN! This particular choice has been absent from both ESN and NEF literature, due to the difficulty of deriving the optimal filter (see appendix of referenced paper for derivation). The fact that we can determine this filter and fit nonlinear tuning curves to it, without even simulating the network on any data, should be surprising to anyone who has attempted to build similar networks. Furthermore, equations 1, 2, and 3 are novel contributions towards harnessing this filter that we could not find elsewhere.

Another major point is the ESN is actually given an extra advantage over our method, because their readout weights are fit to the data, while our approach doesn't even adjust its weights to the actual data (it doesn't need to)!


Reviewer 3

The authors utilize the Neural Engineering Framework to compress a one dimensional continuous time series over a finite time interval into the high-dimensional state space of a linear dynamical system driven by the one dimensional time series itself.

We say low-dimensional (not high-dimensional) in the very first sentence.

While this seems like a problem that might be interesting to solve in general, this contribution is written in a way that makes it rather arduous to appreciate. In particular, the abstract of the paper is bizarrely unclear and imprecise regarding the problem statement and the solution proposed by the authors,

The abstract begins with a clear definition of the rolling window, the filter being applied to the input, and a complete mathematical connection between the window and state-space, together with a reference for the derivation. In order to say this is unclear/imprecise, it seems prudent to give clear/precise examples of what was unclear/imprecise. The reviewer attempted this below, and in both cases we find their concerns to be of trivial importance/relevance at best.

and the importance for the field.

We state this right in the very first paragraph: "We show that this permits the computation of arbitrary nonlinear computations across the history of an input stimulus, while outperforming Echo State Networks in accuracy and training/testing time, without explicitly simulating any training examples." This fundamental computation has not been performed before using a neural network in any efficient manner. The referenced paper includes further discussion and comparison with related work. ESNs are the closest relative, and so we include a comparison showing that we outperform them even after giving them several advantages (see above rebuttals).

From the abstract it is not even clear that the signal being compressed in the state of the recurrent system is a one dimensional signal.

It is a universal convention to denote scalars by italic lower-case variables and vectors by bold lower-case variables. We follow this convention consistently throughout the abstract and its references. In other words, the notation u(t) automatically implies that it is a scalar signal.

It is also not clear how the proposed system differs from other more standard techniques to achieve arguably similar results (modulo the spiking component), such as harmonic analysis

This suggestion is nonsensical. Harmonic analysis is not a mechanism/computation that can be employed by a neural network. It is a general branch of mathematics concerned with the analysis of function representation. This is analogous to looking at a neural integrator model of working memory and saying "it is not clear how this is different from calculus".

and how this method fares against spiking neural network models that aim to achieve similar goals, such as spiking tightly balanced networks.

Again, the suggestion that this is in any way relevant is nonsensical. Balanced spiking networks don't aim to achieve similar goals of representing the history of an input signal. One cannot deploy such a network to solve this task without either reproducing our solution or inventing some other new method. This is missing the point that we did develop such a method. On the other hand, ESNs are networks that have similar goals, and have already been used to solve such tasks, and thus we included an in-depth comparison to these networks.

arvoelke commented 4 years ago

These same methods were extended and benchmarked against LSTMs and other state-of-the-art RNNs at NeurIPS this year (spotlight talk) achieving a new record on a deep learning benchmark: https://papers.nips.cc/paper/9689-legendre-memory-units-continuous-time-representation-in-recurrent-neural-networks.pdf