Closed leondgarse closed 4 years ago
Hi @leondgarse , thank you so much for your contribution.
The build error you see in CI may not due to your code change, I will double check and get back to you. We are having some problem in the CI system.
In the meantime, could you add a unit test here? https://github.com/awslabs/keras-apache-mxnet/tree/master/tests/keras/backend
You can use your reproducible code as a test and add a new file, something like mxnet_tf_model_test.py
Thanks!
Hi @roywei, I added a unit test file tests/keras/backend/mxnet_tf_model_test.py
, and my local test is ok:
import mxnet_tf_model_test
aa = mxnet_tf_model_test.TestMXNetTfModel()
aa.test_batchnorm_layer_reload()
# Using MXNet backend
# axis = [1]
# (1, 10) 1 10
# axis = -1
# (1, 10) 1 10
# axis = 1
# (1, 10) 1 10
It tests loading a tf backend saved model, and then loading a mxnet backend saved model, to make sure everything alright.
Thanks for your work. Is my triggering the right method? What's the failure this time?
@leondgarse could you try push an empty commit and trigger CI? git commit --allow-empty -m "trigger ci"
It seems a multi-threaded test failed in our test environment(docker), and it seems random. So re-trigger should work. Our nightly tests have been passing for a few days.
=================================== FAILURES ===================================
--
699 | ________________________________ test_warnings _________________________________
700 | [gw1] linux -- Python 3.7.6 /root/.pyenv/versions/3.7.6/bin/python3.7
701 |
702 | @pytest.mark.skipif(sys.version_info < (3,),
703 | reason='Cannot catch warnings in python 2')
704 | def test_warnings():
705 | a = Input(shape=(3,), name='input_a')
706 | b = Input(shape=(3,), name='input_b')
707 |
708 | a_2 = Dense(4, name='dense_1')(a)
709 | dp = Dropout(0.5, name='dropout')
710 | b_2 = dp(b)
711 |
712 | model = Model([a, b], [a_2, b_2])
713 |
714 | optimizer = 'rmsprop'
715 | loss = 'mse'
716 | loss_weights = [1., 0.5]
717 | model.compile(optimizer, loss, metrics=[], loss_weights=loss_weights,
718 | sample_weight_mode=None)
719 |
720 | @threadsafe_generator
721 | def gen_data(batch_sz):
722 | while True:
723 | yield ([np.random.random((batch_sz, 3)),
724 | np.random.random((batch_sz, 3))],
725 | [np.random.random((batch_sz, 4)),
726 | np.random.random((batch_sz, 3))])
727 |
728 | with pytest.warns(Warning) as w:
729 | out = model.fit_generator(gen_data(4),
730 | steps_per_epoch=10,
731 | use_multiprocessing=True,
732 | > workers=2)
733 |
734 | tests/keras/engine/test_training.py:607:
735 | _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
736 | keras/legacy/interfaces.py:91: in wrapper
737 | return func(*args, **kwargs)
738 | keras/engine/training.py:1433: in fit_generator
739 | initial_epoch=initial_epoch)
740 | keras/engine/training_generator.py:181: in fit_generator
741 | generator_output = next(output_generator)
742 | keras/utils/data_utils.py:695: in get
743 | inputs = self.queue.get(block=True).get()
744 | /root/.pyenv/versions/3.7.6/lib/python3.7/multiprocessing/pool.py:651: in get
745 | self.wait(timeout)
746 | /root/.pyenv/versions/3.7.6/lib/python3.7/multiprocessing/pool.py:648: in wait
747 | self._event.wait(timeout)
748 | /root/.pyenv/versions/3.7.6/lib/python3.7/threading.py:552: in wait
749 | signaled = self._cond.wait(timeout)
750 | _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
751 |
752 | self = <Condition(<unlocked _thread.lock object at 0x7fba4402c990>, 0)>
753 | timeout = None
754 |
755 | def wait(self, timeout=None):
756 | """Wait until notified or until a timeout occurs.
757 |
758 | If the calling thread has not acquired the lock when this method is
759 | called, a RuntimeError is raised.
760 |
761 | This method releases the underlying lock, and then blocks until it is
762 | awakened by a notify() or notify_all() call for the same condition
763 | variable in another thread, or until the optional timeout occurs. Once
764 | awakened or timed out, it re-acquires the lock and returns.
765 |
766 | When the timeout argument is present and not None, it should be a
767 | floating point number specifying a timeout for the operation in seconds
768 | (or fractions thereof).
769 |
770 | When the underlying lock is an RLock, it is not released using its
771 | release() method, since this may not actually unlock the lock when it
772 | was acquired multiple times recursively. Instead, an internal interface
773 | of the RLock class is used, which really unlocks it even when it has
774 | been recursively acquired several times. Another internal interface is
775 | then used to restore the recursion level when the lock is reacquired.
776 |
777 | """
778 | if not self._is_owned():
779 | raise RuntimeError("cannot wait on un-acquired lock")
780 | waiter = _allocate_lock()
781 | waiter.acquire()
782 | self._waiters.append(waiter)
783 | saved_state = self._release_save()
784 | gotit = False
785 | try: # restore state no matter what (e.g., KeyboardInterrupt)
786 | if timeout is None:
787 | > waiter.acquire()
788 | E Failed: Timeout >1200.0s
789 |
790 | /root/.pyenv/versions/3.7.6/lib/python3.7/threading.py:296: Failed
791 | ----------------------------- Captured stdout call -----------------------------
792 | Epoch 1/1
793 | ----------------------------- Captured stderr call -----------------------------
794 |
795 | +++++++++++++++++++++++++++++++++++ Timeout ++++++++++++++++++++++++++++++++++++
796 |
797 | ~~~~~~~~~~~~~~~~~~~~ Stack of Thread-272 (140435986855680) ~~~~~~~~~~~~~~~~~~~~~
798 | File "/root/.pyenv/versions/3.7.6/lib/python3.7/threading.py", line 890, in _bootstrap
799 | self._bootstrap_inner()
800 | File "/root/.pyenv/versions/3.7.6/lib/python3.7/threading.py", line 926, in _bootstrap_inner
801 | self.run()
802 | File "/root/.pyenv/versions/3.7.6/lib/python3.7/threading.py", line 870, in run
803 | self._target(*self._args, **self._kwargs)
804 | File "/root/.pyenv/versions/3.7.6/lib/python3.7/multiprocessing/pool.py", line 470, in _handle_results
805 | task = get()
806 | File "/root/.pyenv/versions/3.7.6/lib/python3.7/multiprocessing/connection.py", line 250, in recv
807 | buf = self._recv_bytes()
808 | File "/root/.pyenv/versions/3.7.6/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
809 | buf = self._recv(4)
810 | File "/root/.pyenv/versions/3.7.6/lib/python3.7/multiprocessing/connection.py", line 379, in _recv
811 | chunk = read(handle, remaining)
812 |
813 | ~~~~~~~~~~~~~~~~~~~~ Stack of Thread-271 (140436114863872) ~~~~~~~~~~~~~~~~~~~~~
814 | File "/root/.pyenv/versions/3.7.6/lib/python3.7/threading.py", line 890, in _bootstrap
815 | self._bootstrap_inner()
816 | File "/root/.pyenv/versions/3.7.6/lib/python3.7/threading.py", line 926, in _bootstrap_inner
817 | self.run()
818 | File "/root/.pyenv/versions/3.7.6/lib/python3.7/threading.py", line 870, in run
819 | self._target(*self._args, **self._kwargs)
820 | File "/root/.pyenv/versions/3.7.6/lib/python3.7/multiprocessing/pool.py", line 422, in _handle_tasks
821 | for taskseq, set_length in iter(taskqueue.get, None):
822 |
823 | ~~~~~~~~~~~~~~~~~~~~ Stack of Thread-270 (140438099777280) ~~~~~~~~~~~~~~~~~~~~~
824 | File "/root/.pyenv/versions/3.7.6/lib/python3.7/threading.py", line 890, in _bootstrap
825 | self._bootstrap_inner()
826 | File "/root/.pyenv/versions/3.7.6/lib/python3.7/threading.py", line 926, in _bootstrap_inner
827 | self.run()
828 | File "/root/.pyenv/versions/3.7.6/lib/python3.7/threading.py", line 870, in run
829 | self._target(*self._args, **self._kwargs)
830 | File "/root/.pyenv/versions/3.7.6/lib/python3.7/multiprocessing/pool.py", line 413, in _handle_workers
831 | time.sleep(0.1)
832 |
833 | ~~~~~~~~~~~~~~~~~~~~ Stack of Thread-269 (140435410818816) ~~~~~~~~~~~~~~~~~~~~~
834 | File "/root/.pyenv/versions/3.7.6/lib/python3.7/threading.py", line 890, in _bootstrap
835 | self._bootstrap_inner()
836 | File "/root/.pyenv/versions/3.7.6/lib/python3.7/threading.py", line 926, in _bootstrap_inner
837 | self.run()
838 | File "/root/.pyenv/versions/3.7.6/lib/python3.7/threading.py", line 870, in run
839 | self._target(*self._args, **self._kwargs)
840 | File "/codebuild/output/src242250753/src/github.com/awslabs/keras-apache-mxnet/keras/utils/data_utils.py", line 681, in _run
841 | executor.apply_async(next_sample, (self.uid,)), block=True)
842 | File "/root/.pyenv/versions/3.7.6/lib/python3.7/queue.py", line 139, in put
843 | self.not_full.wait()
844 | File "/root/.pyenv/versions/3.7.6/lib/python3.7/threading.py", line 296, in wait
845 | waiter.acquire()
846 |
847 | ~~~~~~~~~~~~~~~~~~~~~ Stack of <unknown> (140440431216384) ~~~~~~~~~~~~~~~~~~~~~
848 | File "/root/.pyenv/versions/3.7.6/lib/python3.7/site-packages/execnet/gateway_base.py", line 285, in _perform_spawn
849 | reply.run()
850 | File "/root/.pyenv/versions/3.7.6/lib/python3.7/site-packages/execnet/gateway_base.py", line 220, in run
851 | self._result = func(*args, **kwargs)
852 | File "/root/.pyenv/versions/3.7.6/lib/python3.7/site-packages/execnet/gateway_base.py", line 967, in _thread_receiver
853 | msg = Message.from_io(io)
854 | File "/root/.pyenv/versions/3.7.6/lib/python3.7/site-packages/execnet/gateway_base.py", line 432, in from_io
855 | header = io.read(9) # type 1, channel 4, payload 4
856 | File "/root/.pyenv/versions/3.7.6/lib/python3.7/site-packages/execnet/gateway_base.py", line 400, in read
857 | data = self._read(numbytes - len(buf))
858 |
859 | +++++++++++++++++++++++++++++++++++ Timeout ++++++++++++++++++++++++++++++++++++
@leondgarse It seems you PR is constantly failing with the same multi-threaded test above. I'm not sure why it's failing. Nightly test all passed. You PR is not affect that test.
Could you try reset to commit 6e230a9
and do a git pull --rebase
instead of merge? Ideally your PR should not contain my changes.
I will also double check why the test constantly fails on your case.
Sorry for the inconvenience caused.
Ya! This is much more like it, here is my commands:
git reset --hard 6e230a9
git pull upstream master --rebase
git push --force
@leondgarse Awsome! merging now. Thanks for your contribution!
Fix axis error in normalization layer when loading model from tf backend saved h5
Summary
When loading model saved by tensorflow backend keras h5 file, met an error:
I write a little demo to reproduce it:
Related Issues
None
PR Overview
It seems in tensorflow backend keras,
axis
inBatchNormalization
is a list, so I add anisinstance
test to theself.axis
init. Then theload_model
function passed.