some questions - Githubissues

yeqingQian commented 6 years ago

Hello, I have some questions about this program. where is the folder "somewhere"? There is no such "data_name/features/model_name.bin" under the folder "data".And it cannot run.

XiaohangZhan commented 6 years ago

Since CDP only performs on features, you need to extract features of your unlabeled data by yourself with pre-trained models, and then link them to the specified location under data.

yeqingQian commented 6 years ago

谢谢你，我还想请教下，config.yaml中的 base: "nas" committee: ['resnet18', 'resnet34', 'resnet50', 'resnet101', 'densenet121', 'vgg16bn', 'inceptionv3', 'ir'] 是相关的模型吗？全部都需要吗？我用我自己的方法提取到特征保存为一个bin文件供这个程序使用，run的时候报缺少bas.bin或者resnet34.bin之类的错误

XiaohangZhan commented 6 years ago

No, they are just examples. In your case, you should create a new config file in a new experiment directory, e.g., experiments/mine/config.yaml, and edit base and committee to be your model name. For example, assume that you use resnet18 as the base model, alexnet and vgg16 as your committee models, then you have feature files named resnet18.bin, alexnet.bin, vgg16.bin. Just edit them to be:\ base: "resnet18"\ committee: ['alexnet', 'vgg16']

yeqingQian commented 6 years ago

base和committee中的resnet18.bin之类的是所有图像的特征文件吗？还是说它是提取图像特征的模型？聚类的话大概需要多长时间呀？我用1700张图像实验的话，跑了快两天了还没有结果，我想请教一下，谢谢

yeqingQian commented 6 years ago

我还想请教一下，这三种模型base-model（基模型），committee model（委员模型），mediator model（调解员模型）是如何训练的？如何获取的？

XiaohangZhan commented 6 years ago

resnet18.bin is the features of all the images, created by array.tofile("filename.bin"). And the dimension is NxK, where N is the number of images and K is the feature dimension (e.g., 256)
It is very fast. In million-level data, it cost about several minutes.
The base model and committee models are trained with standard classification framework with SoftMax Loss using PyTorch. I will release the code for training face recognition later.
The mediator is an MLP binary classifier, the code for mediator is not ready right now, please be patient.

yeqingQian commented 6 years ago

哦哦，好的，非常感谢，谢谢

yeqingQian commented 6 years ago

您好，我还想咨询您几个问题：

base-model 和 committee model的bin文件有没有什么要求，如nas.bin和reanet18.bin两者数据完全相同是否有影响？
mediator model所起的作用是用于训练自动获取阈值吗？
对标签文件meta.txt 是否有何需求？貌似数据我随便给个数字都能实现聚类。

XiaohangZhan commented 6 years ago

If you do not need a committee, just set committee: []
I recommend you read our paper and all the details are in the paper.
meta.txt is the annotated labels. It is for evaluation.

yeqingQian commented 6 years ago

好的，非常感谢

wzc118 commented 6 years ago

I have some questions of mediator. Why the input vector to be 6N+5 dimension, I think mean vector and variance vector to be N+1 dimension each. and the input vector feeded into MLP together？ the neighbors distribution vector doesn't have same column size with “relationship/affinity vector”.

XiaohangZhan commented 6 years ago

I'm sorry I'm afraid I've not got your point. Anyway, 6N+5 comes from: 1) relationship (N), except for the base model, since pairs come from the base model graph, then all the relationships are 1. 2) affinity (N+1), base + committee 3) mean (2N+2), mean of neighbours' similarity, for each node in a pair (two nodes / pair) 4) var (2N+2), similar as mean.

wzc118 commented 6 years ago

Thxs, i am still disturbed about the calculation of pair selection recall&precision. I know the pairwise recall&precision calculated by clustering.

XiaohangZhan commented 6 years ago

Pair selection prec/recall follow the standard way to calculate. That is, recall = TP / (TP + FN), prec = TP / (TP + FP)

yeqingQian commented 6 years ago

您好，我想咨询一下，您提供的mediator.py就是对应的mediator模块吗？train_mediator和test_mediator是封装起来了吗？

XiaohangZhan commented 6 years ago

train_mediator and test_mediator is not implemented in the current codes. They are still under reconsitution, and will be released after CVPR deadline.

yeqingQian commented 6 years ago

哦哦，好的，非常感谢！那就是说当前代码现在还无法实现mediator这个模块的功能？

XiaohangZhan commented 6 years ago

Yes. But you can try voting in experiments/example_vote. It also yields good results.

yeqingQian commented 6 years ago

嗯嗯，vote我有试，在我的数据上聚类效果一般，我的数据集类似于视频抓拍、监控获取的、质量不高

XiaohangZhan commented 6 years ago

If you are interested in discussion with me. Please contact me via xiaohangzhan@outlook.com.

lqsunshine commented 5 years ago

你好我想请教一下，测试代码能否跑通，按照要求设置，我随机生成9个200*256个特征bin文件进行测试（list和meta没更改），结果显示cdp中pairs为空。最后显示无法reshape

XiaohangZhan commented 5 years ago

For randomly generated features, it is hard to reach a consensus among committees. You can reduce the accept_num and threshold under vote in your config file to obtain more accepted pairs. However, note that it will produces meaningless results since the features are random.

lqsunshine commented 5 years ago

thank you for your emails,i will be continue to follow your paper after finishing my current task.

hujuan940506 commented 5 years ago

Thanks for your work. I run code in my dataset, but the performan is lower.

XiaohangZhan commented 5 years ago

I recommend you to:

use kmeans, hac or dbscan to get baseline performances and compare with CDP.
adjust parameters of CDP according to README.

hujuan940506 commented 5 years ago

Thanks, I adjust parameters 'threshold' and 'max_sz'. The performance has a great improved. Do I need to adjust the parameters according to different datasets everytime ?

XiaohangZhan commented 5 years ago

The hyper-parameters are related to different scenarios. It depends on the distribution of samples' similarities. However, if different datasets come from the same sources, the hyper-parameters are generalizable.

hujuan940506 commented 5 years ago

Thank you very much. In engineering, it is usually desirable to cluster automatically. Do you have any suggestions or idea?

XiaohangZhan / cdp

some questions #1