Closed andreas-wilm closed 5 months ago
Hey! Thanks for the kind words and the issue!
That is indeed just an oversight. I guess I never encountered the problem or forgot to investigate if I ever did. Previous to #66 we were just not adjusting the identifier that is associated to the Tensor
as part of the formula loop body in the case where the user used accented quotes. If you wrote:
echo df.filter(f{ `x` < 3 or idx("y") == y })
instead for example, it would produce a yT
variable for you instead. Now, we make sure to produce unique identifiers in all the cases.
And yes, sorry that the docs are still a bit short. You can find a bit more here:
https://scinim.github.io/getting-started/basics/data_wrangling.html
but I should probably write even more about formulas in particular.
Just while you're here, I would mention a couple of things:
"x"
column from the 3
and the "y"
column from the y
variable. There can be cases where the code doesn't manage to deduce the type correctly in more complex expressions (more on that below)col("foo")
or idx("foo")
. This should be more robust in case you ever stumble over other bugs like this onecol
and idx
syntax can also be used to overwrite the type, if the type deduction fails. I.e. you could do:echo df.filter(f{ idx("x", int) < 3 or idx("y", string) == y })
to be explicit about the type information (this is in addition to the type hints you can put at the beginning of the formula, i.e.
f{string -> int: <formula>}
for example. The latter is convenient, but of course not sufficient if you have multiple different types in an expression.
Hope those points help a bit. And feel free to reach out in case something else is unclear (or broken!)! :)
What an awesome answer. Thanks so much!
Hi there,
Many thanks for this super useful library!
I found a potential bug or at least unexpected behaviour in formulas. The problem happens when using a variable that has the same name as the key/column it's applied on.
Here's some example code:
This can be easily avoided by renaming the variable to something else, but it might be worthwhile pointing this out in the docs (which I realize are thin in formulas right now).
Thanks again, Andreas