Isabel-Gan / quantifying-notebook-features

Python scripts to detect and quantify features in Jupyter notebooks
2 stars 0 forks source link

longer markdown cells in the beginning/end #1

Closed Isabel-Gan closed 4 years ago

Isabel-Gan commented 4 years ago

will not catch the case where there may only be a few markdown cells, all placed at the beginning or the end

this would fit in the definition of the feature, but the code will not catch it as it calculates an average length across all markdown cells

Isabel-Gan commented 4 years ago

will not catch the case where there is a constant length of markdown cells (see nb id 165313, incorrectly says "true")

Isabel-Gan commented 4 years ago

define a "longer" markdown cell to have a length of at least 5 lines

Isabel-Gan commented 4 years ago

issues:

Isabel-Gan commented 4 years ago

possible solution: count md cells by character instead???

Isabel-Gan commented 4 years ago

the above issues are resolved, but new issues (false positives):

the issue is that it's counting stuff like html tags in the character count, when those don't actually show up in the markdown cell image

Isabel-Gan commented 4 years ago

maybe going by line was the better way to do it? or filter the actual text out from those lines somehow?

experimenting in https://github.com/Isabel-Gan/quantifying-notebook-features/tree/quantifiying-markdown-length

Isabel-Gan commented 4 years ago

possible fix: https://stackoverflow.com/questions/328356/extracting-text-from-html-file-using-python

Isabel-Gan commented 4 years ago

"fixed" by https://github.com/Isabel-Gan/quantifying-notebook-features/pull/11, but still results in some of the false positives above. upon re-inspection of the notebook, the first markdown cell was actually one line longer than the rest, and the script is correct

Isabel-Gan commented 4 years ago

found a lot of false positives in the actual run, need to change how this is measured

Isabel-Gan commented 4 years ago

https://docs.google.com/document/d/1lucE05_8DUCuKVK4yTYqhlVnMDt4r91bokttCQM01mk/edit?usp=sharing

Isabel-Gan commented 4 years ago

results from discussion with Shurui:

Isabel-Gan commented 4 years ago

fixed by https://github.com/Isabel-Gan/quantifying-notebook-features/commit/c43c7c937cac26ca01277f49edd72a48953c9b05 and https://github.com/Isabel-Gan/quantifying-notebook-features/commit/44433bc276111588874cfe4e8a0f1b153c3678c1

Isabel-Gan commented 4 years ago

last round of fixes:

Isabel-Gan commented 4 years ago

fixed by https://github.com/Isabel-Gan/quantifying-notebook-features/commit/9baab0d5c327b01eb9d888f833e732b9e74a60a2 and https://github.com/Isabel-Gan/quantifying-notebook-features/commit/c150528d0b78ffed6ca49ecabcca4685b10c69cf