JuliaEarth / geospatial-data-science-with-julia

Geospatial Data Science with Julia
https://juliaearth.github.io/geospatial-data-science-with-julia
86 stars 15 forks source link

Various minor comments related to the text I noticed #3

Closed bkamins closed 9 months ago

bkamins commented 9 months ago

In https://github.com/JuliaEarth/geospatial-data-science-with-julia/blob/main/01-geodata.qmd#L175:

We can check that the representation is a valid representation of a table using the `Tables.istable` function:

is not 100% accurate. Tables.istable can return false and still the passed object can be a valid table. I would just skip it. Tables.istable is meant mostly for package developers that know the internals of Tables.jl well.

This is a soft comment


In https://github.com/JuliaEarth/geospatial-data-science-with-julia/blob/main/01-geodata.qmd#L168C2-L168C2 I would explain why you use Symdol (and above you used strings). Also maybe consider using CategoricalArrays.jl if you feel that GENDER is categorical?

This is a soft comment


https://github.com/JuliaEarth/geospatial-data-science-with-julia/blob/main/01-geodata.qmd#L199

This row-major representation can be useful to process data that

Is not 100% accurate. You use "This" word. And the one you use does not have this property. The point is that in general row-wise representation is for larger than RAM data, but not this specific one you present. Also, often "larger than RAM" data will return Tables.istable as false, and only at run-time it is checked that data is indeed tabular.

juliohm commented 9 months ago

Thank you for pointing out these issues. If you prefer to submit a PR with suggestions, I can review them there as well.

juliohm commented 9 months ago

Tables.istable can return false and still the passed object can be a valid table.

That is a bit surprising. So this function is not mandatory for table types? I will remove the sentence from the text as you suggested.

In https://github.com/JuliaEarth/geospatial-data-science-with-julia/blob/main/01-geodata.qmd#L168C2-L168C2 I would explain why you use Symdol (and above you used strings).

I will try to add a sentence explaining the difference.

Also, often "larger than RAM" data will return Tables.istable as false, and only at run-time it is checked that data is indeed tabular.

That is surprising too. I thought that this trait function should always return true for a table type regardless of the implementation details.

juliohm commented 9 months ago

@bkamins do we have concrete examples of infinite tables in Julia currently? Maybe we can add these examples as references in the sentence, and replace "this" by "the" as suggested.

juliohm commented 9 months ago

Updated the examples and removed Tables.jl from the dependencies.

bkamins commented 9 months ago

So this function is not mandatory for table types?

No. See its docstring:

Check if an object has specifically defined that it is a table.
Note that not all valid tables will return true, since it's possible to satisfy the Tables.jl interface at "run-time"

do we have concrete examples of infinite tables in Julia currently?

You can have a table that is an iterator of rows that reads from a stream and returns e.g. a NamedTuple as each row. Generators could often work like this. And generators typically do not define their return type (it is typically Any).

ronisbr commented 9 months ago

No. See its docstring:

I think this is another example of a problem in the documentation that we have a little time ago (I forgot where). The documentation in https://tables.juliadata.org/stable/#Implementing-the-Interface-(i.e.-becoming-a-Tables.jl-source) states that to become a Tables.jl source, you must have Tables.istable returning true.

IMHO, it will be somewhat problematic to treat those cases in which the object can implement the interface but does not provide information that it indeed implements the interface. Probably we can define something that will return wrong information instead of an error by assuming that an object implements Tables.jl API whereas it does not.

My opinion is that we always treat objects that returns Tables.istable == false as something that does not comply with Tables.jl (it can be a table, but must not be accessed using Tables.jl API).

juliohm commented 9 months ago

I agree with @ronisbr, and I feel that something could be done differently.

IMHO, it all boils down to this attempt to represent "infinite" and "finite" tables with the same traits.

ronisbr commented 9 months ago

I found the discussion here:

https://github.com/ronisbr/PrettyTables.jl/issues/220

bkamins commented 9 months ago

it all boils down to this attempt to represent "infinite" and "finite" tables with the same traits.

Not only. This is a general issue that a vector of structs is a table and since this could be any struct it is hard to cover this. See:

julia> DataFrame([Dict(1=>2), Dict(3=>4)])
2×8 DataFrame
 Row │ slots                              keys                               vals                               ndel   count  age     idxfloor  maxprobe
     │ Array…                             Array…                             Array…                             Int64  Int64  UInt64  Int64     Int64
─────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ UInt8[0x00, 0x00, 0x00, 0x00, 0x…  [2, 5, 11, 12, 14, 18, 19, 24, 2…  [2, 5, 11, 12, 14, 18, 19, 24, 2…      0      1       1        15         0
   2 │ UInt8[0x00, 0x00, 0x00, 0x00, 0x…  [2, 5, 9, 10, 12, 16, 17, 21, 24…  [2, 5, 9, 10, 12, 16, 17, 21, 24…      0      1       1        14         0

(i.e. vector of dicts is a table)

bkamins commented 9 months ago

I opened https://github.com/JuliaData/Tables.jl/issues/348 to discuss the issue.

ronisbr commented 9 months ago

Hum, isn't this example a good case for my point? For example, since this is a vector of dictionaries, which reports Tables.istable as false, I think the way PrettyTables prints it is not wrong:

julia> pretty_table([Dict(1=>2), Dict(3=>4)])
┌────────────┐
│     Col. 1 │
├────────────┤
│ Dict(1=>2) │
│ Dict(3=>4) │
└────────────┘
ronisbr commented 9 months ago

Oops, sorry, did not see your comment before posting :) Let's discuss there.

bkamins commented 9 months ago

That behavior is what I called "stricter rule" followed by a package in my post.