churchlab / UniRep

UniRep model, usage, and examples.
338 stars 96 forks source link

Q: get_rep() difference between babbler1900 and babbler256/64 #3

Closed spark157 closed 5 years ago

spark157 commented 5 years ago

Hi,

I'm looking at the get_rep() function (in unirep.py) and noticed the code is (essentially) repeated in the babbler1900 and babbler 256 but there is a small difference and I wasn't clear on why this would be the case.

Specfically for babbler1900:

        final_cell = final_cell[0]
        final_hidden = final_hidden[0]
        hs = hs[0]
        avg_hidden = np.mean(hs, axis=0)
        return avg_hidden, final_hidden, final_cell

and equivalently for babbler256 (and so babbler64):

       final_cell = final_cell[-1]
        final_hidden = final_hidden[-1]
        hs = hs[0]
        avg_hidden = np.mean(hs, axis=0)
        return avg_hidden, final_hidden[0], final_cell[0]

So avg_hidden is the same, but final_hidden and final_cell are different and with different dimensions.

There is a comment in babbler256 that may be the thinking but I'm not quite sure the implications (I don't know the code well enough at this point)

        get_rep needs to be minorly adjusted to accomadate the different state size of the 
        stack.

Can you maybe just verify the code makes sense with the difference between the two.

Thanks.

Scott

spark157 commented 5 years ago

Ahh - I think I understand now that since the 1900 dim mLSTM is a single layer (as opposed to 2 for the 256 layer and 4 for the 64 layer) it is handled differently.

However, wouldn't the same code work for the 1900 as the 256/64? If true could fold the code all into the 1900?

The issue with the difference in dimension is still applicable.

Scott

sandias42 commented 5 years ago

Hi Scott,

The difference is due, as you point out, to the 64 and 256 dimensional models having multiple layers while the 1900 being a single layer output. You can see the shape difference clearly by looking at self._initial_state_placeholder on line 595 or the various state-related properties of mLSTMCellStackNPY, e.g. zero_state on line 278. Compare this to the same function in mLSTMCell1900 on line 70.

The same code won't work for the 1900 because final_hidden = final_hidden[-1] followed by final_hidden[0] would return a single float value instead of a vector (for the 256 and 64 dimensional babbler, there is an extra dimension which is the layer index).

I can confirm that as far as I know both these snippets are correct in their respective classes and CANNOT be swapped for one another.

Thanks for your question. Ethan