Closed JoaoLages closed 3 years ago
The best way to process some specific language file in the dataset is by using the file extension of the language you're looking for. There is _"filename" key with every datapoint. Filter using the file extension of the language.
Example : data["file_name"].split(".")[-1] == "hs"
for haskell.
I hope this helps.
Let me know if you have any more questions. Good day.
Do we have any trained model for SQL?
The model might have some training data that has SQL, but it will not be very representative. If you'd like a model trained specifically on SQL type code I recommend you check out this project: https://github.com/ElementAI/picard#overview
Closing this for now, feel free to reopen if you want to discuss more. However, a better place for an indepth discussion for this might be our discord channel!
I have only found Java. Wonder if someone can spare me the details without having to process the whole dataset :) Thank you for open sourcing it! Awesome stuff!