amphi-ai / amphi-etl

Visual Data Transformation with Python Code Generation. Low-Code Python-based ETL.
https://amphi.ai
Other
891 stars 43 forks source link

An "Amphi" language for formula, filter, etc.. #146

Open simonaubertbd opened 1 month ago

simonaubertbd commented 1 month ago

Hello,

Let's say that not we're not all Python language specialist. The Python syntax is not always intuitive. A very good example of that is how to take the 2 left characters of a string : #left 2 characters string[:2]

On most data analysis tools, there is a built-in language that provides a waaaaaaay simpler syntax like this one left(string,2)

Therefore, I think that would be pretty great to have this language as an alternative in formula and python, also heping transition from any other data analysis tool (Qlik, Alteryx, Dataiku, Tableau, etc...).

Best regards,

Simon

tgourdel commented 1 month ago

Thanks @simonaubertbd, I understand that python could be less intuitive than purpose-built scripting languages. Haven't thought about it. One thing however, doesn't AI/LLM make it way easier to develop this kind of formulas/scripts without really needing to know the syntax :) ?

simonaubertbd commented 1 month ago

Hello @tgourdel LLM make it easier than before, for sure. But it's still mean to go on your chatgpt/gemini/etc to have an answer and loose some time, being distracted. And not anyone use it ;)

On the other hand, it's possible to use genAI to create this language. Here my first test with some of Alteryx functions (a lot is missing but that's the first try)

Here’s a list of some common Alteryx functions and their Python equivalents:

1. String Manipulation Functions:

Alteryx Function Python Equivalent (with Pandas/Native)
LEFT(string, length) string[:length]
RIGHT(string, length) string[-length:]
UPPER(string) string.upper()
LOWER(string) string.lower()
TRIM(string) string.strip()
REPLACE(string, old, new) string.replace(old, new)
FINDSTRING(string, sub) string.find(sub)
SUBSTRING(string, start, length) string[start:start + length]

2. Date/Time Functions:

Alteryx Function Python Equivalent (with datetime library)
DateTimeNow() datetime.datetime.now()
DateTimeAdd(date, value, unit) date + timedelta(days=value) (or seconds, etc.)
DateTimeDiff(date1, date2, unit) (date1 - date2).days (or .seconds, etc.)
DateTimeFormat(date, format) date.strftime(format)
DateTimeParse(string, format) datetime.datetime.strptime(string, format)

3. Math Functions:

Alteryx Function Python Equivalent (with NumPy/Native)
ABS(number) abs(number)
CEIL(number) math.ceil(number)
FLOOR(number) math.floor(number)
ROUND(number, decimals) round(number, decimals)
SQRT(number) math.sqrt(number)
EXP(number) math.exp(number)
LOG(number) math.log(number)
RANDOM() random.random()

4. Conditional Functions:

Alteryx Function Python Equivalent
IF(condition, true_val, false_val) true_val if condition else false_val
IIF(condition, true_val, false_val) Same as above

5. Aggregation Functions:

Alteryx Function Python Equivalent (with Pandas)
SUM(values) df['column'].sum()
COUNT(values) df['column'].count()
AVERAGE(values) df['column'].mean()
MAX(values) df['column'].max()
MIN(values) df['column'].min()

6. Null Handling Functions:

Alteryx Function Python Equivalent (with Pandas)
ISNULL(value) pd.isnull(value)
NULL() None
IFNULL(value, default) value if pd.notnull(value) else default

These are some common functions and their Python equivalents. The Python examples use libraries like Pandas for data frames and handling, as well as native Python functions. Let me know if you'd like more detailed examples or functions!

What if I provide you with a mapping table in order to totally automate the mapping?

Best regards,

Simon