ArtificialSoftwareEngineering / InterpretingCodeGeneration

Interpretability Techniques for Conditioned Code Generation
Apache License 2.0
0 stars 0 forks source link

Train BPEs according to Antonis recommendations #1

Closed danaderp closed 3 years ago

danaderp commented 3 years ago

Unique tokens of codesearch net dataset are around 1763464. Antonis states that taking 20% of unique tokens is enough to train a simple BPE.

ncoop57 commented 3 years ago

See this nb for training steps and usage.