Closed arnaudstiegler closed 4 years ago
Looks very interesting!
Also getting a dataset of Github repos for free is quite nice. I will try to get the dataset tonight (I have to fight Docker installation for the first time first :)). I will keep you posted!
Well I didn't succeed in installing docker because of CUDA 10.1 Something strange about my config files ( i use ubuntu 18.04 ). The docker part is easy, you guys could get in pretty easily. It was pretty scary I though I might get black screen after rebooting since you are supposed to manage everything in Ubuntu... I do not feel very good about trying it again today... If we do not get it before the end of the week i will try again
@arthurherbout
The docker part is really just for running the existing pre-trained models for their own task (which is not ours) and submit to the competition. If the goal is just to get the dataset, I think you only have to run the download_dataset.py file (because that's ultimately what the docker does, and it's basic python).
And even for re-using their model, I think it might not be worth spending time on Docker (unless we have a Docker expert in the group!!!). They use Docker only because they host a competition.
Github has recently launched (i.e. yesterday!) a Code Search Challenge. This is tackling semantic code search which is slightly different than what we do: the aim is to retrieve functions that would correspond to a natural language query given by the user. Here is a good explanation of semantic code search.
Of course, this is off-topic for us, but there are some interesting features that we could potentially leverage for our own project:
I think the last point is the most interesting and could be very big for us: thanks to those provided baseline models, we could do some transfer learning, and this would allow us to use deep learning even though we only have a small amount of data. On top of that, we could be using the latest architecture for text representation (BERT with self-attention) which is exciting as well!
Drop a comment to tell me what you think about this!