Participant %7: Team CSV, Universidad Central "Marta Abreu" de Las Villas

ASSERT-KTH / CodRep

58069 Java source code diffs. http://arxiv.org/pdf/1807.03200

http://arxiv.org/pdf/1807.03200

91 stars 15 forks source link

Participant %7: Team CSV, Universidad Central "Marta Abreu" de Las Villas #20

Open chenzimin opened 6 years ago

chenzimin commented 6 years ago

Created for Team CSV(@cesarsotovalero) from the Universidad Central "Marta Abreu" de Las Villas for discussions. Welcome!

monperrus commented 6 years ago

Excellent, welcome! What's your score on Dataset1?

cesarsotovalero commented 6 years ago

My current scores using just a very naive string comparison based approach:

Score on dataset1: 0.1236735 Score on dataset2: 0.1096176

No machine learning yet.

monperrus commented 6 years ago

Yes. The first 0.8 are easy to get (purely due to the data).

The remaining points are super hard.

Best score seen so far:

Dataset1: 0.114
Dataset2: 0.085

cesarsotovalero commented 6 years ago

My last scores:

Dataset	Perfect Match	Score
Dataset 1	3867	0.11842962430821
Dataset 2	9833	0.108660931336428
Dataset 3	17197	0.0753167732657934

My current approach: string matching + parse checking

chenzimin commented 6 years ago

Thanks, I have updated the rankings

monperrus commented 6 years ago

good scores, getting quite close to @tdurieux :-)

cesarsotovalero commented 6 years ago

Hi everyone, I want to give an update of my scores for the preliminary ranking:

Dataset	Perfect Match	Score
Dataset1	3900	0.1111243868013270
Dataset2	9948	0.0995737723246198
Dataset3	17438	0.0631975953292782
Dataset4	15773	0.0769219481612277

My current approach is: string matching + parse checking + decision rules + heuristics

monperrus commented 6 years ago

It seems that you beat @tdurieux!! Congrats.

It's too late to be considered in the intermediate ranking, but it's really remarkable.

cesarsotovalero commented 6 years ago

Thanks @monperrus!! However, my approach has some performance issues. For instance, it takes almost 2h for Dataset1, which is far from the performance results of @tdurieux. Also, I think the accuracy (in terms of the loss function) should be improved much more to really win the competition. I'll continue working on that.

tdurieux commented 6 years ago

Strangely my technique is still better for the dataset 2 but worse for the others.

I still have some room for improvement but I am very happy of the performance of my technique. It takes less than 10min to have the results on all datasets. That is helping a lot to try new improvements