Closed bkamins closed 9 months ago
Thank you for pointing out these issues. If you prefer to submit a PR with suggestions, I can review them there as well.
Tables.istable
can returnfalse
and still the passed object can be a valid table.
That is a bit surprising. So this function is not mandatory for table types? I will remove the sentence from the text as you suggested.
In https://github.com/JuliaEarth/geospatial-data-science-with-julia/blob/main/01-geodata.qmd#L168C2-L168C2 I would explain why you use
Symdol
(and above you used strings).
I will try to add a sentence explaining the difference.
Also, often "larger than RAM" data will return
Tables.istable
asfalse
, and only at run-time it is checked that data is indeed tabular.
That is surprising too. I thought that this trait function should always return true for a table type regardless of the implementation details.
@bkamins do we have concrete examples of infinite tables in Julia currently? Maybe we can add these examples as references in the sentence, and replace "this" by "the" as suggested.
Updated the examples and removed Tables.jl from the dependencies.
So this function is not mandatory for table types?
No. See its docstring:
Check if an object has specifically defined that it is a table.
Note that not all valid tables will return true, since it's possible to satisfy the Tables.jl interface at "run-time"
do we have concrete examples of infinite tables in Julia currently?
You can have a table that is an iterator of rows that reads from a stream and returns e.g. a NamedTuple
as each row. Generators could often work like this. And generators typically do not define their return type (it is typically Any
).
No. See its docstring:
I think this is another example of a problem in the documentation that we have a little time ago (I forgot where). The documentation in https://tables.juliadata.org/stable/#Implementing-the-Interface-(i.e.-becoming-a-Tables.jl-source) states that to become a Tables.jl source, you must have Tables.istable
returning true
.
IMHO, it will be somewhat problematic to treat those cases in which the object can implement the interface but does not provide information that it indeed implements the interface. Probably we can define something that will return wrong information instead of an error by assuming that an object implements Tables.jl API whereas it does not.
My opinion is that we always treat objects that returns Tables.istable == false
as something that does not comply with Tables.jl (it can be a table, but must not be accessed using Tables.jl API).
I agree with @ronisbr, and I feel that something could be done differently.
IMHO, it all boils down to this attempt to represent "infinite" and "finite" tables with the same traits.
I found the discussion here:
it all boils down to this attempt to represent "infinite" and "finite" tables with the same traits.
Not only. This is a general issue that a vector of structs is a table and since this could be any struct it is hard to cover this. See:
julia> DataFrame([Dict(1=>2), Dict(3=>4)])
2×8 DataFrame
Row │ slots keys vals ndel count age idxfloor maxprobe
│ Array… Array… Array… Int64 Int64 UInt64 Int64 Int64
─────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ UInt8[0x00, 0x00, 0x00, 0x00, 0x… [2, 5, 11, 12, 14, 18, 19, 24, 2… [2, 5, 11, 12, 14, 18, 19, 24, 2… 0 1 1 15 0
2 │ UInt8[0x00, 0x00, 0x00, 0x00, 0x… [2, 5, 9, 10, 12, 16, 17, 21, 24… [2, 5, 9, 10, 12, 16, 17, 21, 24… 0 1 1 14 0
(i.e. vector of dicts is a table)
I opened https://github.com/JuliaData/Tables.jl/issues/348 to discuss the issue.
Hum, isn't this example a good case for my point? For example, since this is a vector of dictionaries, which reports Tables.istable
as false, I think the way PrettyTables prints it is not wrong:
julia> pretty_table([Dict(1=>2), Dict(3=>4)])
┌────────────┐
│ Col. 1 │
├────────────┤
│ Dict(1=>2) │
│ Dict(3=>4) │
└────────────┘
Oops, sorry, did not see your comment before posting :) Let's discuss there.
In https://github.com/JuliaEarth/geospatial-data-science-with-julia/blob/main/01-geodata.qmd#L175:
is not 100% accurate.
Tables.istable
can returnfalse
and still the passed object can be a valid table. I would just skip it.Tables.istable
is meant mostly for package developers that know the internals of Tables.jl well.This is a soft comment
In https://github.com/JuliaEarth/geospatial-data-science-with-julia/blob/main/01-geodata.qmd#L168C2-L168C2 I would explain why you use
Symdol
(and above you used strings). Also maybe consider using CategoricalArrays.jl if you feel thatGENDER
is categorical?This is a soft comment
https://github.com/JuliaEarth/geospatial-data-science-with-julia/blob/main/01-geodata.qmd#L199
Is not 100% accurate. You use
"This"
word. And the one you use does not have this property. The point is that in general row-wise representation is for larger than RAM data, but not this specific one you present. Also, often "larger than RAM" data will returnTables.istable
asfalse
, and only at run-time it is checked that data is indeed tabular.