Open Isabel-Gan opened 4 years ago
idea from Christian: use regex to detect the use of =
or periods without spaces afterwards? should go back through the samples and look for common formats in the commented code
working with this script in https://github.com/Isabel-Gan/quantifying-notebook-features/tree/detecting-commented-code
attempt 1: get a list of 100 most common english words, if the comment has some percentage of those words, then it must be descriptive and cannot be code. this failed because most code comments were highly technical and specific (thus no common words)
attempt 2: use
langdetect
on the comment, if it detects the language as Python, then it must be code, and otherwise, it must be a descriptive comment. this failed because the language detection was not accurate on single lines, and it detected most of the comments asBatchfile
, regardless of whether the comment was descriptive or code