Isabel-Gan / quantifying-notebook-features

Python scripts to detect and quantify features in Jupyter notebooks
2 stars 0 forks source link

detecting commented code #17

Open Isabel-Gan opened 4 years ago

Isabel-Gan commented 4 years ago

attempt 1: get a list of 100 most common english words, if the comment has some percentage of those words, then it must be descriptive and cannot be code. this failed because most code comments were highly technical and specific (thus no common words)

attempt 2: use langdetect on the comment, if it detects the language as Python, then it must be code, and otherwise, it must be a descriptive comment. this failed because the language detection was not accurate on single lines, and it detected most of the comments as Batchfile, regardless of whether the comment was descriptive or code

Isabel-Gan commented 4 years ago

idea from Christian: use regex to detect the use of = or periods without spaces afterwards? should go back through the samples and look for common formats in the commented code

Isabel-Gan commented 4 years ago

working with this script in https://github.com/Isabel-Gan/quantifying-notebook-features/tree/detecting-commented-code