StackOverflowMATLABchat / MATLABfcnscrape

Scrape MATLAB's documentation for all function names and output to JSON files for external use
2 stars 1 forks source link

Build More Robust Function Filtering #1

Closed sco1 closed 6 years ago

sco1 commented 6 years ago

Currently the only exclusions for function scraping are object methods (e.g. foo.bar) and those with leading comments (e.g. %#function).

There are multiple scenarios where this filtering is either too restrictive or not restrictive enough, including, but not limited to:

sco1 commented 6 years ago

To expand on the above checklist, the majority of the lines to be removed fall under these types of patterns:

browseNameSpace\n(opchda)
browseNamespace\n(opcua)
browsenamespace (opcda)
boxplot(LeastSquaresResults,OptimResults,NLINResults)
boxplot(NLMEResults)
ColorSpec (Color Specification)
LineSpec (Line Specification)
Logical\nOperators: Short-circuit
bar, barh
knt2brk, knt2mlt
if, elseif, else

Most of these can be adequately handled with a simple '^\w+' regex pattern. Prior to this regex being applied, lines with , need to be split, and a generic blacklist needs to be applied to get rid of things like ColorSpec and the Logical\nOperators: Short-circuit line.

sco1 commented 6 years ago

As of 8877476, filters should be more or less complete for the current set of scraped toolboxes.

Some minor tweaks may still be made to catch stragglers, at least until I notice that I've done something horrible and filtered out a giant chunk of things I shouldn't have.