Closed JarlLemmens closed 1 year ago
Thanks for your submission, an editor will be assigned soon.
@benoit-girard Can you edit yourself or assign an editor?
Hi @rougier, thank you for the response. Do you have an idea how long the process of reviewing usually takes? Just wondering, since I am new to this :)
@JarfLemmens Srry for the delay. The whole process an take up to six months but it really depends on how fast we find editor and reviewers and how fast you answer. To accelerate things a bit, you can post messages here such that I get a notification and ask people to update their review.
@benoit-girard @gdetor Can one of you edit this submission?
@rougier Okay cool, thanks for the update!
@ReScience/editors Does any of you can edit this submission?
Sorry for the lack of responsiveness. I wil handle the edition of this submission.
@JasonGUTU Would you be interested in reviewing this ReScience submission?
@hkashyap or @birdortyedi : would you be interested in reviewing this submission?
@hkashyap or @birdortyedi : would you be interested in reviewing this submission?
Sure, this is interesting to me and I can review it. @benoit-girard
So @hkashyap will be the first reviewer, and @stepherbin will be the second one. Thanks to both of you for accepting this review.
Dear replication authors.
Here is a review of your work.
The objective of the authors is to verify the claim "that Knowledge Graph re-optimization can increase recall, while maintaining mean Average Precision (mAP)" on detection algorithms by replicating the learning process and evaluation on two standard datasets: Pascal VOC and MS COCO, as in the paper of [Fang et al., 2017]. Their conclusion is that the proposed approach only improves performance on MS COCO dataset while slightly degrading the performance on Pascal VOC, contrarily to the original paper, and therefore depends on the model used. The authors provide a full implementation based on pytorch.
The presentation of the method to be replicated is clear and well summarized. One missing information in the original paper for reproduction (the fact that initial probabilities for re-optimization have to be set to zero) is clearly documented.
I expected a deeper analysis of the algorithm to explain its limits. I see this task as a central part of the replication study. Although the authors provide a small "beyond the paper" experiment, I find it insufficient.
For instance, questions like:
The code is rather easy to understand and sufficiently commented, with a simple API.
I couldn't install the packages using the yaml file (inconsistencies). I manually installed pytorch and several other packages in a conda environment and it seemed to be working. The complementary installation of pyrwr using pip worked.
I only checked VOC problem with default parameters. A generic script for regenerating all the results (for various KG methods) could be provided for complenetess. On the version I tested, the function generated syntactic errors for test_function_kg (l. 154) probably due the the fact that the output of the the prediction model (l. 136) is a simple tensor and not a list when the input list contains a single element (at least in my pytorch version). Fixing this error makes the code execute.
Outputs of 'python -m Results.results_voc' (on my computer using pytorch 1.10.1)
There are 4952 test images containing a total of 14976 objects. Files have been saved to rescience-ijcai2017-230/Datasets.
Currently testing: threshold = 1e-05 bk = 5 lk = 5 epsilon = 0.9 S = KG-CNet-55-VOC top k = 100
TP: tensor(11055.) FP: tensor(397270.) FN: tensor(977.) AP @ 100 per class: {'aeroplane': 0.710239589214325, 'bicycle': 0.7944151759147644, 'bird': 0.7083229422569275, 'boat': 0.49917998909950256, 'bottle': 0.5410887598991394, 'bus': 0.7477031350135803, 'car': 0.8304837346076965, 'cat': 0.8312810659408569, 'chair': 0.4660053849220276, 'cow': 0.8095996379852295, 'diningtable': 0.6266980171203613, 'dog': 0.8198830485343933, 'horse': 0.8325859308242798, 'motorbike': 0.763495922088623, 'person': 0.7816246747970581, 'pottedplant': 0.39205634593963623, 'sheep': 0.7277699112892151, 'sofa': 0.6615461707115173, 'train': 0.7348567247390747, 'tvmonitor': 0.6963554620742798} mAP @ 100 : 0.6987595558166504 Recall @ 100 per class: {'aeroplane': 0.8842105269432068, 'bicycle': 0.9673590660095215, 'bird': 0.9106753468513489, 'boat': 0.8669201731681824, 'bottle': 0.7846481800079346, 'bus': 0.9624413251876831, 'car': 0.9458784461021423, 'cat': 0.9664804339408875, 'chair': 0.8478835821151733, 'cow': 0.9836065173149109, 'diningtable': 0.9174757599830627, 'dog': 0.9795500636100769, 'horse': 0.954023003578186, 'motorbike': 0.944615364074707, 'person': 0.9339664578437805, 'pottedplant': 0.7625000476837158, 'sheep': 0.8966941833496094, 'sofa': 0.9748953580856323, 'train': 0.9503545761108398, 'tvmonitor': 0.8928571343421936} Recall @ 100 all classes (by average): 0.9163517951965332
The figures are coherent with the replication paper, but with small variations. An uncertainty analysis with confidence intervals should be added to assess the replication paper statements.
The paper describes a replication of the original work that seems sound to me. Several implementation details have been clarified after discussing with original authors. The analysis of the algorithm could have been deeper, though, in order to assess more precisely the benefit or limitations of the approach.
@stepherbin Thank you very much for your feedback! @JarlLemmens You can start addressing the points raised in @stepherbin 's review. @hkashyap A gentle reminder: provide your review when possible.
Hi @stepherbin, first of all, sorry for the late reply. Secondly, thank you very much for putting in the time and effort to review our work!
I have checked the installation process on two separate machines, and in both cases it installs without any problems. Could you please provide a more specific error regarding the inconsistencies with installation? I think indeed the error in test_function_kg is a result of the different pytorch installation, as I am not getting this error on my machines.
Regarding the bullit questions you provided, I have some comments/questions for each.
Can there be a negative impact of the KG reweighting scheme? With KG reweighting scheme, you refer to the re-optimization of the detections using the knowledge graph information right? Table 2 (results for VOC) and Table 3 (results for COCO) show the differences between FRCNN and KG-CNET57 in recall and mAP. in the case of VOC we have a recall of 91.7 for the baseline (without re-optimization) vs 91.9 with re-optimization. For the mAP these are 70.4 vs 70.1. For VOC, albeit a small difference, there is indeed a negative impact on mAP when using the KG re-optimization. In the COCO case, there is a similar impact where recall is positively impacted, and mAP negatively. A decrease in mAP means that some detections that were correct without re-optimization have been re-optimized wrongly. So yes, there can be a negative impact. does this answer the question, or should I be more explicit in the paper about this?
What are the decisions (categories and boxes) that have been modified by the re-optimization? The re-optimization process updates the scores of each category of each detection, only if the re-optimized score of a detection outscores the original detection, the label (category) of that box will be updated. The (total) effect of re-optimization per class is also shown in table 2 for the VOC case. I was wondering if perhaps a small qualitative/case study on one or a few image samples would give more clarification to this matter? Just as in the original paper (https://www.ijcai.org/proceedings/2017/0230.pdf) in figure 4, but then also depicting the negative impact case. Would such an example be sufficient, or does that not answer your question?
Since the KG improvement depends on the quality of the original detections, is it possible to assess what is the semantic cost of modifying those detections to get the ground truth? I am not sure what you mean with this question.
What is the impact of the hyper-parameters? For which hyper-parameters would you like to see the impact? For the epsilon (in my opinion the one with the most (direct) impact) (epsilon being the trade-off between original detection and amount of re-optimization), this can quite easily be explained/showed with a separate table/graph by running the existing experiment for different epsilons. The other hyper-parameter with a lot of impact (the box score threshold) is already accounted for in the beyond the paper section.
I am looking forward to hearing your further inputs! Also I would like to gently remind @hkashyap for his review, so that I can work on an updated version of the paper that covers both your pointers. Thanks again!
@JarlLemmens sorry for the delay, I am currently working on the review and I will submit in a week.
@hkashyap thank you! looking forward to your review
The authors of this ReScience submission (replication authors) reimplement the paper "Object Detection Meets Knowledge Graphs" by Fang et al. previously published in IJCAI-17. Since the original paper did not provide source code, the replication authors implemented it from scratch using newer PyTorch libraries, different from the original implementation. The most significant difference is that the replication uses ResNet-50 backbone for object detection, whereas the original paper used VGG backbone. Due to this change, multiple conclusions were derived, such as the re-optimization process is dependent on the object detector performance and knowledge graph approach can actually decrease performance for very good detectors. However, the replication study presented here does not include results using the same VGG backbone, which makes it difficult to compare to the results in the original study.
The provided yaml file for the required Conda environment was not helpful due to package incompatiblity. Not sure if all packages are even required. I had to remove the exact version numbers of the packages to make it work, the authors should do that to the yaml in a minimalistic way.
The dataset and trained models are provided in an organized manner, which saves time, kudos for that.
I observed the same issue as the other reviewer about the output of the prediction model not being a list in my PyTorch installation. After fixing this, the code executed as expected.
I was able to replicate the replication results presented for both VOC and COCO datasets, as well as the Section 4.3 new experiment results with a higher threshold. In some cases, there were slight difference, but nothing significant and mostly rounding changes.
Currently testing: threshold = 1e-05 bk = 5 lk = 5 epsilon = 0.9 S = KG-CNet-55-VOC top k = 100 TP: tensor(11069.) FP: tensor(400168.) FN: tensor(963.) AP @ 100 per class: {'aeroplane': 0.7179303765296936, 'bicycle': 0.8043712973594666, 'bird': 0.696021318435669, 'boat': 0.5145907402038574, 'bottle': 0.5439999103546143, 'bus': 0.7510495185852051, 'car': 0.8366398215293884, 'cat': 0.8359617590904236, 'chair': 0.48131272196769714, 'cow': 0.8080626130104065, 'diningtable': 0.6125404238700867, 'dog': 0.8125175833702087, 'horse': 0.8397406935691833, 'motorbike': 0.7771231532096863, 'person': 0.7843232154846191, 'pottedplant': 0.3963417112827301, 'sheep': 0.737895667552948, 'sofa': 0.6353936195373535, 'train': 0.7309917211532593, 'tvmonitor': 0.6964332461357117} mAP @ 100 : 0.7006620764732361 Recall @ 100 per class: {'aeroplane': 0.8947368860244751, 'bicycle': 0.9643917083740234, 'bird': 0.915032684803009, 'boat': 0.8631178736686707, 'bottle': 0.7889125943183899, 'bus': 0.9577465057373047, 'car': 0.9483763575553894, 'cat': 0.9608938097953796, 'chair': 0.8584656119346619, 'cow': 0.987704873085022, 'diningtable': 0.9271844625473022, 'dog': 0.9795500636100769, 'horse': 0.9597700834274292, 'motorbike': 0.9538461565971375, 'person': 0.9330830574035645, 'pottedplant': 0.7541667222976685, 'sheep': 0.9090908765792847, 'sofa': 0.9707112908363342, 'train': 0.9468084573745728, 'tvmonitor': 0.8928571343421936} Recall @ 100 all classes (by average): 0.9183223843574524
Currently testing: threshold = 1e-05 bk = 5 lk = 5 epsilon = 0.75 S = KF-500-COCO top k = 100 AP @ 100 per class: {'person': 0.42004427313804626, 'bicycle': 0.23318567872047424, 'car': 0.2947446405887604, 'motorcycle': 0.29722216725349426, 'airplane': 0.43403273820877075, 'bus': 0.4945223927497864, 'train': 0.4786454141139984, 'truck': 0.2374054491519928, 'boat': 0.1704973727464676, 'traffic light': 0.19896620512008667, 'fire hydrant': 0.5331732630729675, 'street sign': 0.0, 'stop sign': 0.564350426197052, 'parking meter': 0.267412930727005, 'bench': 0.16225935518741608, 'bird': 0.25010353326797485, 'cat': 0.4558740556240082, 'dog': 0.4444156587123871, 'horse': 0.3762286603450775, 'sheep': 0.4005529284477234, 'cow': 0.3616058826446533, 'elephant': 0.5400511622428894, 'bear': 0.5956308245658875, 'zebra': 0.5560437440872192, 'giraffe': 0.5915590524673462, 'hat': 0.0, 'backpack': 0.09385406225919724, 'umbrella': 0.23406867682933807, 'shoe': 0.0, 'eye glasses': 0.0, 'handbag': 0.08345004916191101, 'tie': 0.22355425357818604, 'suitcase': 0.18850819766521454, 'frisbee': 0.4648464620113373, 'skis': 0.0853266566991806, 'snowboard': 0.17299018800258636, 'sports ball': 0.3863285481929779, 'kite': 0.39229217171669006, 'baseball bat': 0.23580093681812286, 'baseball glove': 0.3419892489910126, 'skateboard': 0.3180837035179138, 'surfboard': 0.2289624959230423, 'tennis racket': 0.4226250648498535, 'bottle': 0.24639634788036346, 'plate': 0.0, 'wine glass': 0.2846655547618866, 'cup': 0.3280509114265442, 'fork': 0.151094451546669, 'knife': 0.09519923478364944, 'spoon': 0.09142474830150604, 'bowl': 0.28933367133140564, 'banana': 0.15350903570652008, 'apple': 0.16788233816623688, 'sandwich': 0.2860325872898102, 'orange': 0.3574628233909607, 'broccoli': 0.20719854533672333, 'carrot': 0.1425948292016983, 'hot dog': 0.26880401372909546, 'pizza': 0.4864168167114258, 'donut': 0.36339670419692993, 'cake': 0.25741809606552124, 'chair': 0.18030241131782532, 'couch': 0.284697026014328, 'potted plant': 0.21564118564128876, 'bed': 0.3336133360862732, 'mirror': 0.0, 'dining table': 0.21570488810539246, 'window': 0.0, 'desk': 0.0, 'toilet': 0.41972097754478455, 'door': 0.0, 'tv': 0.4319237768650055, 'laptop': 0.43690046668052673, 'mouse': 0.4938998222351074, 'remote': 0.21422357857227325, 'keyboard': 0.3535045087337494, 'cell phone': 0.22163479030132294, 'microwave': 0.39696362614631653, 'oven': 0.197708398103714, 'toaster': 0.0, 'sink': 0.25485721230506897, 'refrigerator': 0.306932270526886, 'blender': 0.0, 'book': 0.08003350347280502, 'clock': 0.44794130325317383, 'vase': 0.3089244067668915, 'scissors': 0.15788370370864868, 'teddy bear': 0.2590126097202301, 'hair drier': 0.01155115570873022, 'toothbrush': 0.08186015486717224, 'hair brush': 0.0} mAP @ 100 : 0.29641902446746826 Recall @ 100 per class: {'person': 0.5312979817390442, 'bicycle': 0.39868998527526855, 'car': 0.4529411196708679, 'motorcycle': 0.4635983109474182, 'airplane': 0.5826446413993835, 'bus': 0.6214953660964966, 'train': 0.6061798334121704, 'truck': 0.5083085298538208, 'boat': 0.36059853434562683, 'traffic light': 0.342801570892334, 'fire hydrant': 0.6092308163642883, 'street sign': 0.0, 'stop sign': 0.7229999303817749, 'parking meter': 0.4679245352745056, 'bench': 0.3229965567588806, 'bird': 0.4244897961616516, 'cat': 0.6189743280410767, 'dog': 0.6221053004264832, 'horse': 0.5472119450569153, 'sheep': 0.5715469121932983, 'cow': 0.590721607208252, 'elephant': 0.697282612323761, 'bear': 0.7599999904632568, 'zebra': 0.6704141497612, 'giraffe': 0.7371428608894348, 'hat': 0.0, 'backpack': 0.3209790289402008, 'umbrella': 0.3830246925354004, 'shoe': 0.0, 'eye glasses': 0.0, 'handbag': 0.28855830430984497, 'tie': 0.3999999761581421, 'suitcase': 0.4179190695285797, 'frisbee': 0.6188235878944397, 'skis': 0.2819494605064392, 'snowboard': 0.3847058415412903, 'sports ball': 0.48247867822647095, 'kite': 0.5243542790412903, 'baseball bat': 0.40263158082962036, 'baseball glove': 0.46637168526649475, 'skateboard': 0.4699029326438904, 'surfboard': 0.4413612484931946, 'tennis racket': 0.5636986494064331, 'bottle': 0.4189937114715576, 'plate': 0.0, 'wine glass': 0.3996710777282715, 'cup': 0.5147222280502319, 'fork': 0.28594595193862915, 'knife': 0.29999998211860657, 'spoon': 0.26174864172935486, 'bowl': 0.5486692190170288, 'banana': 0.34923550486564636, 'apple': 0.4076233506202698, 'sandwich': 0.5113333463668823, 'orange': 0.5780612230300903, 'broccoli': 0.43418803811073303, 'carrot': 0.3876325488090515, 'hot dog': 0.48452386260032654, 'pizza': 0.6437869668006897, 'donut': 0.5281632542610168, 'cake': 0.4871920943260193, 'chair': 0.36778274178504944, 'couch': 0.547340452671051, 'potted plant': 0.4119718074798584, 'bed': 0.6014184355735779, 'mirror': 0.0, 'dining table': 0.43503791093826294, 'window': 0.0, 'desk': 0.0, 'toilet': 0.6198529005050659, 'door': 0.0, 'tv': 0.6224390268325806, 'laptop': 0.5922222137451172, 'mouse': 0.6698412895202637, 'remote': 0.4273223876953125, 'keyboard': 0.5109589099884033, 'cell phone': 0.37449997663497925, 'microwave': 0.5149999856948853, 'oven': 0.4214285910129547, 'toaster': 0.0, 'sink': 0.4689474105834961, 'refrigerator': 0.4726027548313141, 'blender': 0.0, 'book': 0.3266506493091583, 'clock': 0.59375, 'vase': 0.5, 'scissors': 0.3012820780277252, 'teddy bear': 0.4660605788230896, 'hair drier': 0.04444444552063942, 'toothbrush': 0.31707316637039185, 'hair brush': 0.0} Recall @ 100 all classes (averaged): 0.4728471636772156 Recall @ 100 small: 0.28937751054763794 Recall @ 100 medium: 0.5052471160888672 Recall @ 100 large: 0.6245576739311218
Currently testing: threshold = 0.05 bk = 5 lk = 5 epsilon = 0.9 S = KF-500-VOC top k = 100 TP: tensor(10048.) FP: tensor(20716.) FN: tensor(1984.) AP @ 100 per class: {'aeroplane': 0.6866281032562256, 'bicycle': 0.7792302370071411, 'bird': 0.6703633069992065, 'boat': 0.48980414867401123, 'bottle': 0.5355979204177856, 'bus': 0.735668420791626, 'car': 0.819940984249115, 'cat': 0.8110095262527466, 'chair': 0.4661140739917755, 'cow': 0.8022223114967346, 'diningtable': 0.6221809387207031, 'dog': 0.8019189834594727, 'horse': 0.8133650422096252, 'motorbike': 0.7648594975471497, 'person': 0.775041401386261, 'pottedplant': 0.38187214732170105, 'sheep': 0.7274808883666992, 'sofa': 0.6267770528793335, 'train': 0.714433491230011, 'tvmonitor': 0.6733880043029785} mAP @ 100 : 0.6848948001861572 Recall @ 100 per class: {'aeroplane': 0.7859649062156677, 'bicycle': 0.8961424231529236, 'bird': 0.7930282950401306, 'boat': 0.6844106316566467, 'bottle': 0.6695095896720886, 'bus': 0.8967136144638062, 'car': 0.8917568325996399, 'cat': 0.916201114654541, 'chair': 0.6904761791229248, 'cow': 0.9262294769287109, 'diningtable': 0.8203883767127991, 'dog': 0.920245349407196, 'horse': 0.9109195470809937, 'motorbike': 0.8830769062042236, 'person': 0.8659452199935913, 'pottedplant': 0.5916666984558105, 'sheep': 0.8181818127632141, 'sofa': 0.9121338725090027, 'train': 0.8581560254096985, 'tvmonitor': 0.7792207598686218} Recall @ 100 all classes (by average): 0.8255184292793274
The code is highly readable and I did not have any problem navigating it.
First of all, this is the only open source implementation of an interesting paper and the replication authors implemented it from scratch. As the replication authors pointed out, many minute yet crucial details, such as epsilon values and tricks such as setting non-highest-scoring class probabilities were set to 0, would have been lost without this replication effort.
By using a different backbone network for object detection, the replication authors have broadened the scope of discussion. Is the re-optimization method dependent on the object detector performance? Is the re-optimization useful at all for top performing object detectors? These are interesting questions not discussed in the original paper.
The replication objectively shows with evidence that some of the claims in the original paper are not generalizable. The arguments are straightforward and clear.
Section 4.3 extra experiment shows that for strong detections (when the false positives are to be avoided, most practical cases), the re-optimization actually hurts performance. This is an interesting observation missed in the published paper.
As this is a replication, the first thing to do is implement the original method exactly, it could be using different libraries like PyTorch, but the method should be same. VGG and ResNet are two different networks and as a backbone to Faster R-CNN, it is difficult to compare them. Therefore, the replication should first implement the method using VGG backbone, compare against the original paper, before exploring and comparing against other backbones like ResNet.
In Section 5, an argument was made that ResNet50 already accounts for co-occurances and hence cannot be improved with re-optimization. On the other hand VGG can be still improved. While it may be possible, it cannot be argued conclusively based on the final mAP and Recall numbers alone. The argument needs to be verified with data, one way to measure this could be to look at how much P matrix changes during re-optimization for ResNet and VGG. If the P change is always significantly greater for VGG than for ResNet, then it could back the argument.
While the original paper is vague about mAP changes due to re-optimization, calling it "do not compromise mAP" and "without afecting mAP", they do not claim mAP improvement in the original paper. In fact, the published results also show decrease in mAP After re-optimization. Therefore, it should not be mentioned as not being in-line in Section 4.2. Instead the mAP decrease is stronger for ResNet. Again, a VGG backbone implementation will make this more clear.
To test the argument that better object detectors learn the co-occurances better, the replication authors can compare results using progressively deeper ResNet architectures, to see there is a gradual improvement with deeper architectures.
Instead of comparing just two threshold values, the replication authors should show the trend between re-optimization improvement vs. threshold values.
The submission provides a reimplementation of an interesting idea combing deep learning with knowledge based approaches, so it has value for the AI community. However, the comments of both reviewers need to be addressed in an updated version before it can be accepted.
@hkashyap Thank you very much for your feedback!
@JarlLemmens a gentle reminder that @hkashyap 's review awaits your answers.
@stepherbin do you have comments or requests after @JarlLemmens 's answers to your review?
Hi @benoit-girard thank you for the check-up. I was looking into @hkashyap 's feedback and trying to implement the vgg-16 backbone as suggested. I managed to implement the vgg-16 backbone in pytorch, using https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html ‘s explanation to modify the model to another backbone. I trained the model for the VOC dataset with similar parameters as described in the original paper, and run the experiments again, also with the described parameters.
I found the following results:
model | ResNet50 | (as used before) | VGG-16 | (trained by me) | VGG-16 | (published by them) |
---|---|---|---|---|---|---|
metric | mAP | recall | mAP | recall | mAP | recall |
FRCNN | 70.4 | 91.7 | 59.0 | 88.9 | 66.5 | 81.9 |
KF-500 | 70.4 | 91.8 | 58.9 | 88.9 | 66.5 | 83.8 |
KF-All | 70.4 | 92.1 | 58.9 | 89.1 | 66.5 | 84.6 |
KG-CNet | 70.1 | 91.9 | 58.6 | 88.9 | 66.6 | 85.0 |
As you can see, there is quite the difference between the found results and the published results regarding the baselines. However, the knowledge re-optimization is done after the model is completely trained, so the trends in the results can still be compared. Interesting to see here is that these trends follow the earlier found trends with the ResNet50 model, where the mAP remains more or less equal (slightly less for the KG-CNet case) and the increase in recall does not match the findings of the original paper.
So in my opinion this experiment shows that the (results of the) original paper are not reproducible. The methodology however, is reproducible, and could be valuable for further research, perhaps in other fields rather than object detection. Since the original code is not available, I think it could be a great contribution to publish this reproducibility paper/code.
I wonder if the findings of this extra experiment will be sufficient for this work to be considered for publishing (along with the other comments), after including it in the paper ofcourse. I hope to hear from you regarding these new insights.
Hi, I was wondering if you by any chance had the opportunity to read my question already. @hkashyap @stepherbin @benoit-girard
Thank you for the additional experiment using VGG backbone. It is interesting to see that you achieved a similar trend in mAP and Recall as the ResNet results, yet different from the claim about increasing recall in the original paper. This to me confirms the non-reproducibility of the published results. However, can you explain the difference in mAP between the VGG results achieved in your experiment and the original paper? They are different ranges altogether. As I mentioned in my previous comments, I think it would be useful to compare P matrix changes for VGG and ResNet (multiple levels of depth) during re-optimization to confirm that the presented scheme actually is detrimental for strong detectors. With these added to the manuscript and the code, I think this reproduction study is suitable for publication in Rescience.
Hi @hkashyap, thank you for the comments, I just wanted to inform here that I am working on a revised version of the paper/code that includes the feedback, and that this will be coming in the coming weeks.
Any progress?
Hi @rougier, @benoit-girard, @hkashyap, @stepherbin,
I have updated the paper and code to address all discussed feedback.
The updated final paper, and relevant information can be found here:
Original article: https://www.ijcai.org/proceedings/2017/0230.pdf
PDF URL: [Re] Object Detection Meets Knowledge Graphs.pdf
Metadata URL: metadata.yaml
Code URL: https://github.com/tue-mps/rescience-ijcai2017-230
Scientific domain: Computer Vision Programming language: Python
To clarify the updates in this version:
I have updated the environment issues by including a requirement.txt file that only consists of the essential packages, instead of the complete environment. I have also updated the installation process in the read me accordingly.
The issue with the model output being a list/tensor has been resolved in the code.
I have included a script that will run multiple experiments such that the output of the experiments can be easily reproduced.
I have adapted the model to include a VGG-16 backbone (similar as the original paper) to verify the results of the original paper. As there was no readily available model for this, I had to recreate their original Caffe model in PyTorch, which I did to the best of my knowledge. There is still a small differences between both baselines, which is likely due to (small) differences in how both models are implemented in both environments. By adding this experiment, we can conclude that the original implementation is not reproducible at all.
I also included an additional experiment for the VOC dataset for a ResNet-18 backbone to show that our earlier claim that the presented scheme is detrimental for stronger detectors can be dropped, as this is not the case.
The results for the ResNet-50 tables have been updated as well (if someone would notice some different values there), this is because of a small change in parameters of our original implementation.
I expanded the beyond the paper section with 2 sets of experiments. The first looks at a set of different boxscore threshold values to see the effect of re-optimization on different scoring bounding box scores. The second extra experiment shows the effect of the hyperparameter epsilon which is the factor that decides the percentage of knowledge awareness or original scores.
To finalize, we have included a comparison with the original implementation, with additional other models and experiments that go beyond the original paper, to conclude that the claims in the original paper are not reproducible.
I think your article is in good shape and @benoit-girard can probably makes a decision soon. Don't hesitate to remind us here if you don't seen any progress soon (or email me directly). We're in the middle of a transition to a new (more responsive) system and this might delay a bit the processing of this submission.
Sorry for the delayed answer. I do validate the paper as it is, thank you for all the provided improvements. We will start the publishing process.
Thank you @rougier and @benoit-girard, that's great news! Is there anything that I can/have to do in the coming period?
Sorry for delay. Can you point me to the sources of your article? I'll need them for publication (as well as the metadata.yaml file)
Hi @rougier
The updated final paper, and relevant information can be found here:
Original article: https://www.ijcai.org/proceedings/2017/0230.pdf
PDF URL: [Re] Object Detection Meets Knowledge Graphs.pdf
Metadata URL: metadata.yaml
Code URL: https://github.com/tue-mps/rescience-ijcai2017-230
Sorry, I meant the sources for your article (I'll need to recompile it several times)
Do you mean the Latex source code?
Yes. Or else you can fill the metada.yaml (asking me for missing information), re-compile the PDF and send me links. But we'll need to do that 2 or 3 times.
I have added the sources to link. Please let me know if this indeed what you need.
Yes, but I cannot compile it. Can you?
Oh.. I must admit that I worked with the Overleaf template, so I filled in the metadata.tex file manually as suggested (but still filled in the yaml file later, because I thought you needed it). So I downloaded the article sources from there. I tried using the 'make' command, but it gives me indeed an error that it cannot find a specified file.. I don't know if I made a mistake somewhere. but if it's the easiest, I can just manually edit the metadata.tex file again and recompile in overleaf? or do you require the compiling with 'make' to function necessarily?
Ok. Then you can fill editor + reviewers (look at rescience board page for ORCID), fill dates and use volume 9, issue 1, number 1. Generate a new pdf and pos both pdf and metadata here. I'll use them to get a DOI that you'll need to inegrate on second step.
Hi,
Sorry if it takes a couple days for my responses, I am currently finishing up on my Master's thesis, so it is a bit hectic.
I couldn't find @stepherbin info on the rescience board page, but I managed to find it online.
The metadata is updated and still here
The new article: Re.Object.Detection.Meets.Knowledge.Graphs.pdf
Hi @rougier, just noticed that I didn't tag you in my last message, so here's a quick notification :)
Sorry for delay. We'll need to synchronize at some point. In your PDF, check the "A replication of Fang2017ObjectGraphs." which sould be replaced by the actual citation. In the metadata, add quote around replication / cite and try to fill in the abstract.
Hi @rougier, I tried adding the quotes in the metadata, but that did not change anything. I guess it has something to do with the (standard) header.tex file. Here also the abstract is commented out, so I don't know what's up with that. I added the abstract to the metadata, but the pdf is unchanged as to before.
Ok. Then you can directly modify the latex file (changes will be lost next time but we have only 2 passes before publication)
Hey @rougier, sorry for the late response again, but I'm now fully done with my masters, so I should be able to finish this soon. I'm still a bit confused about how it should look like. I tried looking at earlier published works, but they don't seem to have a standard layout to these regards. If you could perhaps link me one of the earlier published works that have the same citation etc, that would be very helpful!
Everything is almost good. Last point is "A replication of Fang2017ObjectGraphs" that shoudl be replaced with "A replication of Y. Fang, K. Kuan, J. Lin, C. Tan, and V. Chandrasekhar. “Object detection meets knowledge graphs.” In: Proceed- ings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, {IJCAI-17}. International Joint Conferences on Artificial Intelligence, 2017, pp. 1661–1667."
At this point, the easiest is to directly modify the latex.
Ah okay I see what you mean now. Re.Object.Detection.Meets.Knowledge.Graphs.pdf Here is the adapted version again
Perfect! Let's try to publish it now!
Given that I cannot compile your sources, easisest way would be to have a quick visio meeting (10mn) to speed things up. Can you contact me by mail such that we can find the best time to meet?
It's online ! Thanks again and veyr sorry for all the delays. https://rescience.github.io/read/
Yess! Very cool to see it online :D Thank you!
Original article:
PDF URL: [Re] Object Detection Meets Knowledge Graphs.pdf Metadata URL: metadata.txt Code URL: https://github.com/tue-mps/rescience-ijcai2017-230
Scientific domain: Computer Vision Programming language: Python Suggested editor: