why can't the model.fit method take images to x and labels to y? What is the reason for passing array of zeroes to y?

In this line of code, in CRNN Model.ipynb notebook:

batch_size = 256
epochs = 10
model.fit(x=[training_img, train_padded_txt, train_input_length, train_label_length], y=np.zeros(len(training_img)), batch_size=batch_size, epochs = epochs, validation_data = ([valid_img, valid_padded_txt, valid_input_length, valid_label_length], [np.zeros(len(valid_img))]), verbose = 1, callbacks = callbacks_list)

From what I understand, x argument of fit method have been assigned with array of images, indexed text with zeroes padded, an array of 31s with same length as input array and length of each target word. y argument of fit method have been assigned with an array of zeroes.

My doubts:

why are zeroes being sent to y, not train_padded_txt, since this is the array that contains text or target labels.
Why is x being assigned with all of those things. What is the reason. Why can't it just be: model.fit(x=training_img, y=train_padded_txt,,,...)

Please help me understand these. Thanks in advance and very nice work and nice of you to open source this.

TheAILearner / A-CRNN-model-for-Text-Recognition-in-Keras

why can't the model.fit method take images to x and labels to y? What is the reason for passing array of zeroes to y? #15