arrowrbook / book

https://arrowrbook.com
9 stars 0 forks source link

A few cleanups #1

Closed jonkeane closed 2 months ago

jonkeane commented 2 months ago

https://github.com/arrowrbook/book/blob/b854f06e518ef694552d025f9578e11a376fe86e/files_and_formats.qmd#L426 needs to be pretty printed and not exponentiated

In https://github.com/arrowrbook/book/blob/b854f06e518ef694552d025f9578e11a376fe86e/files_and_formats.qmd#L500 we should add a note that these work in R and Python separately / respectively. You have access to the data in the other languages, but it's not automatic (cause how would that even work???)

In https://github.com/arrowrbook/book/blob/b854f06e518ef694552d025f9578e11a376fe86e/files_and_formats.qmd#L627 something is off with that <number> bit there. If I'm reading https://parquet.apache.org/docs/file-format/data-pages/encodings/#dictionary-encoding-plain_dictionary--2-and-rle_dictionary--8 correctly, I don't know that there's one specific number, but it's a combination of number + string lengths such that if the dictionary becomes too large then it falls back to PLAIN