Open moodymudskipper opened 6 months ago
Also waldo doesn't see those. Ultimately we really need our own waldo, with:
.subset()
and .subset2()
I suppose the output of construct_issues()
is not used in snapshots so this should not be a breaking change in practice.
was closed by mistake
Maybe we test if the serialisation is correct, and if it's not we rerun more carefully ?
R can create negative zeros, NAs, NaNs, that are mostly not recognised by R functions.
https://twitter.com/antoine_fabri/status/1778467270819213778
Should we take care of those ? This part is not too hard, the only thing is that it might confuse the user, and it means we won't compress
c(0, -0, 0, 0)
intorep(0, 4)
for instance.However the following shows that this sign does matter:
This byte issue comes up also with
bit64
integers, 0 and NA are considered identical and negative values are all considered identical because the package does some bit hacking.Defining row.names as
c(NA, -n)
rather than1:n
also creates "identical" objects with a different serialisation.We could have also other types of corruptions, like below:
Created on 2024-04-12 with reprex v2.0.2
In that case it's interesting that
identical()
actually sees the difference, so we have 2 differentTRUE
values.Encoding hell is another issue.
I'm afraid that if we're too agressive about serialising everything it will slow down the package, but I also really want this package to be helpful in these difficult corner cases, maybe we can have an argument for deep checks, and solve some specific cases with special casing.