Open daviddabingu opened 3 months ago
If you provide some example data and expected output it makes it easier to help out.
From the AlaSQL Bot
Your implementation of the CORR()
function logic looks well-structured, but there seems to be an issue with the Pearson correlation coefficient calculation, possibly due to either the data or the implementation of the accumulation logic.
Here are a few potential issues to check:
Data Points: Make sure that the data passed to CORR()
contains valid numeric values and there are no undefined or non-numeric values. If valueX
or valueY
is non-numeric or undefined, it should not contribute to the calculation.
Numerator and Denominator Calculation:
(count * sumXY) - (sumX * sumY)
is correct and makes sense given the data.(denominatorX * denominatorY)
does not lead to a division by zero. If denominatorX
or denominatorY
is zero, the correlation should be undefined, hence returning null
(which you already handle).Edge Cases: Ensure your function handles edge cases, such as when there is only one data point or when all valueX
or valueY
are the same, which would result in a zero denominator.
Here's an improved implementation suggestion:
stage 1/2
accumulates data correctly.null
when invalid data (e.g., single data point or zero variance) is encountered.alasql.aggr.CORR = function(valueX, valueY, accumulator, stage) {
if (stage === 1) {
// Initialize the accumulator object
accumulator = {
sumX: 0,
sumY: 0,
sumXY: 0,
sumX2: 0,
sumY2: 0,
count: 0
};
}
if (stage === 1 || stage === 2) {
if (typeof valueX === 'number' && typeof valueY === 'number') {
accumulator.sumX += valueX;
accumulator.sumY += valueY;
accumulator.sumXY += valueX * valueY;
accumulator.sumX2 += valueX * valueX;
accumulator.sumY2 += valueY * valueY;
accumulator.count++;
}
}
if (stage === 3) {
const { count, sumX, sumY, sumXY, sumX2, sumY2 } = accumulator;
const numerator = (count * sumXY) - (sumX * sumY);
const denominatorX = Math.sqrt((count * sumX2) - (sumX * sumX));
const denominatorY = Math.sqrt((count * sumY2) - (sumY * sumY));
const denominator = denominatorX * denominatorY;
if (denominator === 0 || count === 0) {
return null;
}
const result = numerator / denominator;
return result;
}
return accumulator;
};
numerator
, denominatorX
, denominatorY
, and denominator
to check values during the computation.valueX
and valueY
are not mistakenly stringified or null during the process.If the issue persists, you might want to inspect the input data to confirm it's as expected.
I confirmed that there is no null or undefined in the dataset. When I try the same calculation in R, it gives me corr: 0.48.
This is the test dataset that we are using. Global Superstore_latest.csv
Thank you.
I've checked the source file src/55functions.js
where I was able to have a look at how aggr functions are defined.
However, I figured all the aggr functions only take a single variable as an input, such as below.
alasql.aggr.median = alasql.aggr.MEDIAN = function (v, s, stage) {
if (stage === 2) {
if (v !== null) {
s.push(v);
}
return s;
}
if (stage === 1) {
if (v === null) {
return [];
}
return [v];
}
if (!s.length) {
return null;
}
let r = s.sort((a, b) => {
if (a > b) return 1;
if (a < b) return -1;
return 0;
});
let middle = (r.length + 1) / 2;
let middleFloor = middle | 0;
let el = r[middleFloor - 1];
if (middle === middleFloor || (typeof el !== 'number' && !(el instanceof Number))) {
return el;
} else {
return (el + r[middleFloor]) / 2;
}
};
So, it seems it only takes a single column(v), whereas calculating correlation is always a matter of two columns... just like this mathematical equation.
Any clever way to take a detour on this?
Yep. A .aggr function only takes one col as input. I would try to do the calculation outside of alasql if you need two.
Hello. I am trying to implement CORR() using custom function logic provided here: https://github.com/alasql/alasql/wiki/User-Defined-Functions
This is my code.
But, somehow this returned
{0 : {corr: 3}}
, which signifies the calculation is not being done properly, considering pearson correlation always lies between -1 and 1... Any guess or help to figure out what to do? Thank you in advance.