Berkeley-Data / hpt

MIT License
2 stars 3 forks source link

pretraining with s1/s2 fusion data #68

Open taeil opened 3 years ago

taeil commented 3 years ago

Hey folks,

Let's take a look at a more straightforward contrastive learning baseline, using the following set of pretraining techniques, ie each of the following steps is a separate pretraining technique, ordered by complexity:

  1. (pretraining) Have conv1 take 12 bands as input and use all 12 bands to do moco pretraining (this merges s1 and s2 data to be treated as a single "image"). Use whatever subset of the moco-v2 augmentations we have available across the 12 bands.
  2. Do the same as (1.) except include cases where either s1 or s2 are not included for both the query and key view (e.g. a query might have s1+s2,s1, or s2 as input and the key would have the same input with a different set of augmentations). We can 0-pad the removed s1 or s2.
  3. Do the same as (2.) except include cases where either s1 or s2 are not included for either the query and key view (e.g. a query might have s1+s2,s1, or s2 as input and the key might have s1+s2,s1, or s2 as input).

The idea here is that in (1.) we're doing instance discrimination where an input image is really a composition of 2 images. In (2.) we're doing instance discrimination where an input image can also be from only s1 or s2. In (3.), an input image can be from either (1.) or (2.).

taeil commented 3 years ago

@suryatechie can you handle the evaluation side? I can start working on pre-training.

suryagutta commented 3 years ago

Hi @taeil, I will work on the BigEarthNet dataset as they are two different datasets (S1 and S2) at present.

taeil commented 3 years ago

Question to Colorado) First, to make sure I understand correctly, for pretraining, we are going back to original Moco where we use single anchor image, not two anchors where different augmentation resulting in query and key images?

In the original moco, so to speak, we use augmentations of a single anchor image to create a query and key. Rather than have a query come from one satellite and a key come from another, I'm proposing that we treat them as one image, and then through steps 1-3, we do contrastive learning where we potentially remove certain bands.

Secondly, Is there anything else we can try for existing evaluation in terms of conv1 mapping so we can keep some of work?

We're not going to throw this work away; it just needs some further debugging. The simpler method I'm proposing we explore is a way of debugging our results in that it is more similar to existing work, and we can examine the evaluations results with this technique. Systematically, here's the thought process

(1) I don't trust that the pretraining is working at all yet. The pretraining loss curves look okay, but none of the evaluations are indicating that the pretraining is learning useful representations, including my own tests with linear evals. (2) Without trusting (1), it's hard to spend time debugging the evaluation training. (3) Let's use a more traditional version of (1) in order to develop more trust in the pretraining, and then work on the evaluation (4) Depending on the results of steps (1-3), i.e. fixing certain bugs/issues, then we can make adjustments to the input modules code. (5) At the very least, the input modules (separate inputs) vs 12 band inputs (combined inputs) provide an ablation study

suryagutta commented 3 years ago

That's my understanding as well. For pretraining, we are going back to the original Moco. I agree, let us make sure that the original Moco works, then add the input modules' complexity and make adjustments accordingly. That way, we know the performance using original Moco (single anchor image), get confidence, and start adding complexity. My understanding: Original Moco design will have many positive pairs due to augmentation. There is only one positive pair in the current natural augmentation, and with the input module also we could be losing some info. These things can be debugged once we know the performance using original moco (single anchor and multiple augments) to get the confidence that the pre-training is working well.

tchken commented 3 years ago

added diagram draft for new pertaining model arch in here . please see if it make sense and feel free to edit/ changes.

taeil commented 3 years ago

updated codes for step 1. running 20 epochs on mid dataset. Please help review the code changes. https://github.com/Berkeley-Data/OpenSelfSup/pull/15

taeil commented 3 years ago

running pretraining with step 2: laced-water-61

https://github.com/Berkeley-Data/OpenSelfSup/pull/17

taeil commented 3 years ago

our 3rd pretraining model is running. This one is slower and bigger than previous two. https://wandb.ai/cal-capstone/hpt4/runs/2iu8yfs6?workspace=user-taeil. It won't finish until tomorrow but we may have at least 100 epoch model which we can evaluate against.

Please help review/merge PR.

taeil commented 3 years ago

1) full fusion 2) 3) partial fusion

sensor augmentation