cmusv / pysheetgrader-core

PySheetGrader -- grading spreadsheet assignments made easy!
Other
2 stars 2 forks source link

Option to evaluate Excel formulas in python to handle built-in functions #48

Open erdogmush opened 1 year ago

erdogmush commented 1 year ago

Certain rubric types evaluate the a key cell's formula with submission file references. This is normally done by python, therefor built-in Excel functions cannot be handled unless the built-in functions are explicitly implemented. Look for libraries that can parse an excel formula and evaluate it without requiring Excel COM API.

This applies only to relative, relative-f, check, test rubric types.

A candidate library that implements most common Excel functions are:

https://github.com/vinci1it2000/formulas

This library must be tested throughly first with some unit tests.

colelloa commented 1 year ago

List of all functions is unfortunately not in one file, but is in one place across the various files in this directory: https://github.com/vinci1it2000/formulas/tree/2e65820371060ddc5e2026ce7d1e73322b4076cf/formulas/functions

Might be time consuming to assess how much is covered by cross-referencing https://support.microsoft.com/en-us/office/excel-functions-alphabetical-b3944572-255d-4efb-bb96-c6d90033e188, but I'd assume the vast majority of widely used ones are supported.

Current coverage (with the way to print it out from a Python terminal) has 207 functions, against around 450-500 total functions according to the link above :

>>> from formulas.functions import get_functions
>>> print(sorted(list(get_functions().keys())))
['ABS', 'ACOS', 'ACOSH', 'ACOT', 'ACOTH', 'ADDRESS', 'AND', 'ARABIC', 'ARRAY', 'ARRAYROW', 'ASIN', 'ASINH', 
'ATAN', 'ATAN2', 'ATANH', 'AVERAGE', 'AVERAGEA', 'AVERAGEIF', 'BIN2DEC', 'BIN2HEX', 'BIN2OCT', 'CEILING', 
'CEILING.MATH', 'CEILING.PRECISE', 'COLUMN', 'CONCAT', 'CONCATENATE', 'CORREL', 'COS', 'COSH', 'COT', 
'COTH', 'COUNT', 'COUNTA', 'COUNTBLANK', 'COUNTIF', 'CSC', 'CSCH', 'CUMIPMT', 'DATE', 'DATEDIF', 
'DATEVALUE', 'DAY', 'DEC2BIN', 'DEC2HEX', 'DEC2OCT', 'DECIMAL', 'DEGREES', 'DUMMYFUNCTION', 'EDATE', 
'EVEN', 'EXP', 'FACT', 'FACTDOUBLE', 'FALSE', 'FIND', 'FLOOR', 'FLOOR.MATH', 'FLOOR.PRECISE', 'FORECAST', 
'FORECAST.LINEAR', 'FV', 'GCD', 'HEX2BIN', 'HEX2DEC', 'HEX2OCT', 'HLOOKUP', 'HOUR', 'IF', 'IFERROR', 'IFNA', 
'IFS', 'INDEX', 'INT', 'IPMT', 'IRR', 'ISBLANK', 'ISERR', 'ISERROR', 'ISEVEN', 'ISLOGICAL', 'ISNA', 'ISNONTEXT', 
'ISNUMBER', 'ISO.CEILING', 'ISODD', 'ISOWEEKNUM', 'ISTEXT', 'LARGE', 'LCM', 'LEFT', 'LEN', 'LN', 'LOG', 'LOG10', 
'LOOKUP', 'LOWER', 'MATCH', 'MAX', 'MAXA', 'MEDIAN', 'MID', 'MIN', 'MINA', 'MINUTE', 'MOD', 'MONTH', 'MROUND', 
'NA', 'NOT', 'NOW', 'NPER', 'NPV', 'OCT2BIN', 'OCT2DEC', 'OCT2HEX', 'ODD', 'OR', 'PI', 'PMT', 'POWER', 'PPMT', 
'PRODUCT', 'PV', 'RADIANS', 'RAND', 'RANDBETWEEN', 'RATE', 'REPLACE', 'RIGHT', 'ROMAN', 'ROUND', 
'ROUNDDOWN', 'ROUNDUP', 'ROW', 'SEARCH', 'SEC', 'SECH', 'SECOND', 'SIGN', 'SIN', 'SINGLE', 'SINH', 'SLOPE', 
'SMALL', 'SQRT', 'SQRTPI', 'STDEV', 'STDEV.P', 'STDEV.S', 'STDEVA', 'STDEVP', 'STDEVPA', 'SUM', 'SUMIF', 
'SUMPRODUCT', 'SWITCH', 'TAN', 'TANH', 'TIME', 'TIMEVALUE', 'TODAY', 'TRIM', 'TRUE', 'TRUNC', 'UPPER', 'VAR', 
'VAR.P', 'VAR.S', 'VARA', 'VARP', 'VARPA', 'VLOOKUP', 'WEEKDAY', 'WEEKNUM', 'XIRR', 'XNPV', 'XOR', 'YEAR', 
'YEARFRAC', '_XLFN.ACOT', '_XLFN.ACOTH', '_XLFN.ARABIC', '_XLFN.CEILING.MATH', '_XLFN.CEILING.PRECISE', 
'_XLFN.CONCAT', '_XLFN.COT', '_XLFN.COTH', '_XLFN.CSC', '_XLFN.CSCH', '_XLFN.DECIMAL', 
'_XLFN.FLOOR.MATH', '_XLFN.FLOOR.PRECISE', '_XLFN.FORECAST.LINEAR', '_XLFN.IFNA', '_XLFN.IFS', 
'_XLFN.ISOWEEKNUM', '_XLFN.SEC', '_XLFN.SECH', '_XLFN.SINGLE', '_XLFN.STDEV.P', '_XLFN.STDEV.S', 
'_XLFN.SWITCH', '_XLFN.VAR.P', '_XLFN.VAR.S', '_XLFN.XOR', '__XLUDF.DUMMYFUNCTION']
colelloa commented 1 year ago

@erdogmush it might be helpful to know which of the above functions are most likely to be used, to guide some of the TDD

erdogmush commented 1 year ago

Yes, I see that they are in different files, according to category, defined in the FUNCTIONS array. Here are the ones I'd use in 18657:

MIN, MAX, IF, IFNA, IFERROR, NOT, AND, OR, TRUE, FALSE, SIGN, SQRT, STDEV, STDEV.P, STDEV.S, AVERAGE, MEDIAN, CORREL, SLOPE, FORECAST, FORECAST.LINEAR, ROUND, ROUNDUP, ROUNDDOWN, COUNT, COUNTIF, COUNTA, COUNTBLANK, ISERROR, ISERR, ISBLANK, ISNA, ISTEXT, ISNUMBER, ISLOGICAL, LN, EXP, IRR, SUM, SUMIF, INDEX, ADDRESS, HLOOKUP, VLOOKUP, ROW, COLUMN, CONCATENATE, CONCAT.

If I had to sample the above for testing, I'd pick these as representatives:

MAX, IF (with NOT, AND, OR), SQRT, LN, STDEV.S, AVERAGE, SUM, ROUND, COUNTIF, CONCATENATE, VLOOKUP

The only one that is not on the list is: INTERCEPT, but super easy to add as a contributor if we want to (already there, but no API, it's just the first result (a) of the xslope function.