gher-uliege / DINCAE

DINCAE (Data-Interpolating Convolutional Auto-Encoder) is a neural network to reconstruct missing data in satellite observations.
GNU General Public License v3.0
47 stars 19 forks source link

How can i adapt the trained model(.ckpt.meta) for the test dataset #3

Closed jsihunh closed 3 years ago

jsihunh commented 3 years ago

### Before I ask about my problem, I really appreciate a kind response from the author, Thank you.

I continuously fail to adapt the trained DINCAE model on the test dataset when I run the pre-trained model on the test dataset, an unidentified error message came from python. May I ask how can I adapt the pre-trained model? Can anyone succeed in my question?

model_call_path="/share/ocean/SSS/SMAP_SSS_JPL_L2/G_2_stack_map_EA_3X3window/result/window5_RF_only/" os.chdir(model_call_path) sess = tf.compat.v1.Session() saver=tf.train.import_meta_graph('./best_model.ckpt.meta') saver.restore(sess,tf.train.latest_checkpoint(model_call_path))

# test dataset without added clouds
# must be reinitializable
test_dataset = tf.data.Dataset.from_generator(
    test_datagen, (tf.float32,tf.float32),
    (tf.TensorShape([jmax,imax,nvar]),tf.TensorShape([jmax,imax,2]))).batch(batch_size)

if nprefetch > 0:
    # train_dataset = train_dataset.prefetch(nprefetch)
    test_dataset = test_dataset.prefetch(nprefetch)

test_iterator = tf.compat.v1.data.Iterator.from_structure(test_dataset.output_types,
                                                test_dataset.output_shapes)
test_iterator_init_op = test_iterator.make_initializer(test_dataset)

test_iterator_handle = sess.run(test_iterator.string_handle())

handle = tf.compat.v1.placeholder(tf.string, shape=[], name = "handle_name_iterator")

iterator = tf.compat.v1.data.Iterator.from_string_handle(
        handle, test_iterator.output_types, output_shapes = test_iterator.output_shapes)

inputs_,xtrue = iterator.get_next()

(skip build structure code)

       timestr = datetime.now().strftime("%Y-%m-%dT%H%M%S")
        fname = os.path.join(outdir,"data-{}.nc".format(timestr))

        # reset test iterator, so that we start from the beginning
        sess.run(test_iterator_init_op)

        for ii in range(ceil(test_len / batch_size)):
            summary, batch_cost,batch_RMS,batch_m_rec,batch_σ2_rec = sess.run(
                [merged, cost,RMS,m_rec,σ2_rec],
                feed_dict = { handle: test_iterator_handle,
                              mask_issea: mask })

            # time instances already written
            offset = ii*batch_size
            savesample(fname,batch_m_rec,batch_σ2_rec,meandata,lon,lat,e,ii,
                       offset, transfun = transfun)

Error message FailedPreconditionError: 2 root error(s) found. (0) Failed precondition: Attempting to use uninitialized value conv2d_3/kernel_1 [[node conv2d_3/kernel_1/read (defined at :506) ]] [[Sqrt_2/_131]] (1) Failed precondition: Attempting to use uninitialized value conv2d_3/kernel_1 [[node conv2d_3/kernel_1/read (defined at :506) ]] 0 successful operations. 0 derived errors ignored.

I'm really want to adapt it for a separated cross-validation dataset. image from the paper image "sess" was called above picture and run the trained model using the session as I mentioned first. image

Alexander-Barth commented 3 years ago

What I did in the paper is that I added clouds to the last 50 images (test data) and let DINCAE reconstruct these missing data and then I validated them outside DINCAE (training and inference of test data is the same step). What that be an option for you?

jsihunh commented 3 years ago

I'm still confused about 50 images for validation.

" I added clouds to the last 50 images (test data) " = 1) you put 50 images on the last sequence of the training dataset(25yr SST)? or 2) put only separated 50 images(test data) run the DINCAE model(both training and test phase).

" validated them outside DINCAE (training and inference of test data is the same step)." = Can I understand this phrase to run a model using only 50 images as input data and model conducted training phase
image

I'm really wondering how to validate using 50 images data with cloud mask (1) input data shape, and (2) the question that I mentioned above, using only 50 images as input data or input dataset contains training and test set. (3) revise some part of code in test session run. The first case) As I understand the following cases separated the input dataset e.g) input dataset( train + test dataset) train session = 25yr SST only calculate RMSE test session = 50 images and save sample

the second case) As I understand the following cases as 50 images in the input dataset e.g) input dataset( test dataset) delete (train session = 25yr SST only calculate RMSE) Not calculate RMSE Run (test session) = 50 images (for reconstruction) and save sample

Alexander-Barth commented 3 years ago

My input file has 5266 time instances. For the time instance from 5217 to 5266 (50 images), I removed some data for validation by adding even more clouds before giving this data to DINCAE. (Inside DINCAE, there are even more data masked as missing to prevent overfitting, but this is at another level and independent from this question)

train_datagen and test_datagen both represent the full time series (train_len = test_len = 5266 time instances). train_datagen adds noise to prevent overfitting which is disabled for test_datagen:

            # add missing data during training randomly 
            # (note: these added "clouds" are *different* from the data masked in the last 50 images
            if train:
                #imask = random.randrange(0,missing.shape[0])
                imask = random.randrange(0,ntime)

                for j in range(ndata):
                    selmask = missing[j][imask,:,:]
                    xin[:,:,2*j][selmask] = 0
                    xin[:,:,2*j+1][selmask] = 0

                # add jitter
                for j in range(ndata):
                    xin[:,:,2*j] += jitter_std[j] * np.random.randn(sz[1],sz[2])
                    xin[:,:,2*j + 2*ndata + 4] += jitter_std[j] * np.random.randn(sz[1],sz[2])
                    xin[:,:,2*j + 4*ndata + 4] += jitter_std[j] * np.random.randn(sz[1],sz[2])

The output of DINCAE is the full time series of 5266 images, from which I extracted the last 50 images for validation.

jsihunh commented 3 years ago

Thank you for your sufficient explanation about my question. I'm really clear to understand your validation method.

Alexander-Barth commented 3 years ago

Thanks for your confirmation!

Alexander-Barth commented 2 years ago

Congratulation for your paper!

Jung, S.; Yoo, C.; Im, J. High-Resolution Seamless Daily Sea Surface Temperature Based on Satellite Data Fusion and Machine Learning over Kuroshio Extension. Remote Sens. 2022, 14, 575. https://doi.org/10.3390/rs14030575

I added your publication to the list here: https://github.com/gher-ulg/DINCAE.jl/tree/071fdb5adea1a63d55472a7957a952044368fcfc#publications

All the best, Alex

jsihunh commented 2 years ago

Thank you so much. Your impressive paper provided me with a great deal of research knowledge. Congratulations on your new publication this time. If there is an opportunity, I would like to solicit feedback on the research I am conducting.

Sihun

Alexander-Barth commented 2 years ago

Thanks 😀! Yes, do not hesitate to reach out!