danielzuegner / code-transformer

Implementation of the paper "Language-agnostic representation learning of source code from structure and context".
https://www.in.tum.de/daml/code-transformer/
MIT License
166 stars 31 forks source link

PreprocessingException: Error processing batch 0: No snippets left after filtering step. Skipping batch #24

Closed TejaswiniiB closed 1 year ago

TejaswiniiB commented 2 years ago

Hi, Preprocessing is running fine without errors for few codes, but it is throwing the exception "PreprocessingException: Error processing batch 0: No snippets left after filtering step. Skipping batch" for few other codes. Can anyone tell how to overcome this error?

The code being used for preprocessing is from the interactive_prediction.ipynb notebook provided.

preprocessor = CTStage1Preprocessor(code_snippet_language, allow_empty_methods=True)
stage1_sample = preprocessor.process([("", "", code_snippet)], 0)
tobias-kirschstein commented 2 years ago

Hi,

thanks for your interest in the Code Transformer. This PreprocessingException just means that all code snippets in a preprocessing batch (usually contains 10 snippets) were filtered out due to various reasons:

You can adapt the filtering behaviour with the preprocessing config such as the one we used for preprocessing the CSN code snippets. That some snippets will be filtered out is normal. However, if you suspect that snippets are dropped that should be ok, it is probably because of some formatting issue. Do you have example snippets that cannot be preprocessed?