RandomFractals / vscode-data-preview

Data Preview 🈸 extension for importing 📤 viewing 🔎 slicing 🔪 dicing 🎲 charting 📊 & exporting 📥 large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files
https://marketplace.visualstudio.com/items?itemName=RandomFractalsInc.vscode-data-preview
Apache License 2.0
550 stars 59 forks source link

Support `.feather` file extension for arrow format #279

Closed bloodearnest closed 2 years ago

bloodearnest commented 3 years ago

Some tools, notably R, use .feather as the file extension for arrow data, as that's the actual name of the serialization format of arrow data.

E.g. https://blog.rstudio.com/2016/03/29/feather/

Data Preview works fine if you rename the file extension from .feather to .arrow, but it would be good to support it OOTB.

RandomFractals commented 3 years ago

there is more you'd have to change for that. see some of my closed enhancements tickets and commit history when I was adding other data file types support.

Thanks for the suggestion and first attempt tho :slightly_smiling_face:

bloodearnest commented 3 years ago

I had a look at some of those issues, particularly #2 and #12, and noticed there's a few additional regexes that needed updating also, so have had done that, as well as updating the README.md docs.

My intent is not to create a new file type (i.e. arrow === feather), but wanted to have the menu options/keyboard shortcuts work for a .feather file exactly like they do for a .arrow file.

I couldn't figure out how to run the tests locally. Let me know if there's anything more that's needed.

RandomFractals commented 3 years ago

you also need to update data file ext. switches in data.view.ts , data.view.html and data.view.js. see other commits in #12 for example of what's involved. The other package.json and docs update changes look good so far. Thanks!

Also, I've seen other dev groups use .arrow file name extension. We can add feather. I believe that was old name for that framework and prototype.

I would rather avoid removing .arrow data files support at this stage, considering there are tens of thousands of devs using this data preview extension now, and removing that would break their test arrow data files, etc.

bloodearnest commented 3 years ago

Thanks for the pointers, will take a look.

Had no intention to remove .arrow file extension support at all, just to add .feather - I didn't think I'd done that?

I've seen people use .arrow and .feather interchangeably, agree that .arrow should still be supported.

Arrow is the in-memory layout - feather is the default serialisation of that to disk. The author of feather talks a bit about it here: https://wesmckinney.com/blog/feather-and-apache-arrow/. Feather V2 came out in 2020: https://ursalabs.org/blog/2020-feather-v2/

AIUI, .arrow files are technically feather files, just some groups name them .arrow and some .feather. My goal is to support both in Data Preview as the same thing.

RandomFractals commented 3 years ago

yeah, I stopped reading his posts at some point. I still like that format a lot.

Appreciate you scrubbing in for this. If you don't mind I'll let you poke around more and see how far you can get to add .feather files support.

For building and testing it, you just need to install code, run npm install build and F5 in vscode to debug udpated extension. See last section in my readme.md.

We probably should update arrow js after you are done with those chages. I know the version I last integrated is about a year old.

Ping me here or DM on twitter if you have some questions how to wire the rest: https://twitter.com/TarasNovak

And many thanks for checking it out and going for a file type update.

abekfenn commented 2 years ago

Would love this feature, what's left to complete this @RandomFractals @bloodearnest? Would also be great to support '.ftr'

RandomFractals commented 2 years ago

@abekfenn I'll check it out next weekend. so far those changes looked close enough.

RandomFractals commented 2 years ago

btw, this is on hold as I started working on new tabular data viewer extension that will provide better support for large arrow data files soon: https://github.com/RandomFractals/tabular-data-viewer

abekfenn commented 2 years ago

Sounds great, would love to know when that's ready. I'll keep an eye out

RandomFractals commented 2 years ago

@abekfenn I am hoping to wrap up that extension MVP with latest arrow data bindings support this month.

abekfenn commented 2 years ago

Hi @RandomFractals how's implementation of arrow data bindings coming along? Would love to check it out if it's ready.

RandomFractals commented 2 years ago

@abekfenn arrow data has been supported by this extension before anyone even knew what that was.

There are no new updates on that front here. Thanks for your interest!

RandomFractals commented 2 years ago

@bloodearnest All the online data examples I've seen recently, including arrow data support in new duckdb use .arrow file name extension.

I'd rather not complicate things and stick to one file extension name per data format.

So, I am going to close this request as I believe .feather is a dated file naming for the arrow data files.