Tencent / tencent-ml-images

Largest multi-label image database; ResNet-101 model; 80.73% top-1 acc on ImageNet
Other
3.05k stars 514 forks source link

Some question about codes and paper #43

Open Jason-xin opened 5 years ago

Jason-xin commented 5 years ago

sorry to bother you, I have two questions:

  1. when calculating loss, the first step is "a. get loss coeficiente" and the corresponding codes as follows: image Whether it refers to r in loss function: image But the explanation of r is not matched with these codes, image So, can you tell me what is these codes? Especially for pos_loss_coef(0.01), neg_loss_coef(8) and loss_coef...

  2. In train.py, _record_parserfn make image like image = image_preprocess.preprocess_image(image=image, output_height=FLAGS.image_size, output_width=FLAGS.image_size, object_cover=0.7, area_cover=0.7, is_training=is_training,, bbox=bbox) But in finetune.py, record_parser_fn make image like image = image_preprocess.preprocess_image(image=image, output_height=FLAGS.image_size, output_width=FLAGS.image_size, object_cover=0.0, area_cover=0.05, is_training=is_training,, bbox=bbox) Can you tell me why differ in object_cover and area_cover?

Thanks!

Jason-xin commented 5 years ago

@wubaoyuan I need your help, thanks!

wubaoyuan commented 5 years ago

@Jason-xin Sorry for the late reply. The code and the arxiv are indeed inconsistent. The version in code is the latest one, while the arxiv describes an old version. We will update the r_t^j in arxiv asap.

In the code, (r_t^j)_positive = max(0.01, log10(10/(0.01+t)) < 1; (r_t^j)_negative = max(0.01, log10(10/(8+t)) < 0.1.

The principle of designing r_t^j is monotonically decreasing with respect to t. You can try other decreasing functions in your training.

wubaoyuan commented 5 years ago

@Jason-xin For the second question, as the tasks and datasets of the training and fine-tuning are significantly different, it is a natural choice of different pre-processing.

Jason-xin commented 5 years ago

@wubaoyuan OK, another question. In train.py, why tf.nn.softmax is used to calculate probabilities and classes is tf.argmax (means only one tag predicted)? In fact, each sample in ML-Image dataset has multiple tags... `# build model net = resnet.ResNet(features, is_training=(mode == tf.estimator.ModeKeys.TRAIN)) logits = net.build_model() predictions = { 'classes': tf.argmax(logits, axis=1), 'probabilities': tf.nn.softmax(logits, name='softmax_tensor') }

if mode == tf.estimator.ModeKeys.PREDICT:
    return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)

`

Jason-xin commented 5 years ago

@wubaoyuan And, I changed the tf.nn.softmax to tf.nn.sigmoid in image_classification.py, and test mlimagenet model, so the result was multi-label classification? I don't know whether it works out or not?

chaoqing commented 5 years ago

@wubaoyuan with the formula you provided, when t change from 1 to 2, the ratio of weight $r_t$ between positive and negative tag increase from 20 to 70, is this indeed what you mean by monotonically decreasing ? Why not fix r_t^0 ?

The principle of designing r_t^j is monotonically decreasing with respect to t. You can try other decreasing functions in your training.

Tower0823 commented 4 years ago

@wubaoyuan And, I changed the tf.nn.softmax to tf.nn.sigmoid in image_classification.py, and test mlimagenet model, so the result was multi-label classification? I don't know whether it works out or not?

I have the same question with you! While in train.py, I didn't modify the softmax to sigmoid.Only when testing by image_classification, I changed the softmax to sigmoid, and it did work with not bad result. But I am really confused, why not change the softmax to sigmoid in training and only calculate the first class accuracy?