DAGWorks-Inc / hamilton

Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.

https://hamilton.dagworks.io/en/latest/

BSD 3-Clause Clear License

1.86k stars 124 forks source link

[good first issue - beginner] Pandas Readers & Writers #284

Open skrawcz opened 1 year ago

skrawcz commented 1 year ago

Is your feature request related to a problem? Please describe. We need to add more Pandas Readers & Writers (Savers & Loaders in our internal parlance).

Describe the solution you'd like We need to have readers & if appropriate, writers, covering:

[x] #409
[x] #292 (assigned to @benhhack)
[x] #407 (assigned to @bengineerdavis)
[ ] pandas fwf
[x] #342 ()
[x] #341 (assigned to @bryangalindo)
[x] #369 (assigned to @JoJo10Smith)
[x] #352 (assigned to @JoJo10Smith )
[ ] pandas hdf
[x] #384 (assigned to @JoJo10Smith )
[x] #406
[x] pandas orc (assigned to @JoJo10Smith )
[ ] pandas sas
[x] pandas spss
[x] #355 (assigned to @bryangalindo )
[ ] #375
[x] #377 (assigned to @JoJo10Smith )
[ ] Latex writer

We should cover I/O as listed here.

Additional context We need to start building wrappers around the common ways people will want to save/load data. That way they'll have off the shelf ways to get onto Hamilton easily.

If you're interested in contributing

If you are interested in contributing, picking up one of the above should be straightforward.

Ask for one, and we'll assign it.
We'll create an issue for you.
We'll then work with you on that issue.

In terms of effort, for an example of a desired class, see this code. It basically involves:

Reading the subsequent documentation.
Creating the right class.
Creating some tests for it.
Creating an example to put into our examples repository.

benhhack commented 1 year ago

Hey there, I would love to give this a go with one of the pandas i/o methods. I'm new to contributing on GitHub so I appreciate the cooperative work.

elijahbenizzy commented 1 year ago

Hey there, I would love to give this a go with one of the pandas i/o methods. I'm new to contributing on GitHub so I appreciate the cooperative work.

@benhhack

That would be great! Hopefully there are enough examples to get you started -- let us know what you need above that. No judgement if you use gpt-* to help you out as well -- I've found its helpful for translation/repetetive tasks like this.

skrawcz commented 1 year ago

Hey there, I would love to give this a go with one of the pandas i/o methods. I'm new to contributing on GitHub so I appreciate the cooperative work.

@benhhack yeah thanks for offering to help! Just to make sure this indeed is the right issue to get started with, what's your comfort level with python & pandas?

benhhack commented 1 year ago

Quite familiar with using both, you can check out my repos to see my level. Never really done anything like this before though, so I'm quite interested to see how it goes.

skrawcz commented 1 year ago

Quite familiar with using both, you can check out my repos to see my level. Never really done anything like this before though, so I'm quite interested to see how it goes.

Cool. I would take a look at https://pandas.pydata.org/docs/reference/io.html, pick one, and then we can claim that here and create an issue to move discussion to. Which one would you like?

benhhack commented 1 year ago

Quite familiar with using both, you can check out my repos to see my level. Never really done anything like this before though, so I'm quite interested to see how it goes.

Cool. I would take a look at https://pandas.pydata.org/docs/reference/io.html, pick one, and then we can claim that here and create an issue to move discussion to. Which one would you like?

Looking at that they all seem equally vague, haha. You can assign me to whichever you'd feel is most appropriate/best to start on.

skrawcz commented 1 year ago

Looking at that they all seem equally vague, haha. You can assign me to whichever you'd feel is most appropriate/best to start on.

Sure. @benhhack mind commenting on #292 so I can assign it to you?

benhhack commented 1 year ago

Looking at that they all seem equally vague, haha. You can assign me to whichever you'd feel is most appropriate/best to start on.

Sure. @benhhack mind commenting on #292 so I can assign it to you?

Have commented :))

skrawcz commented 1 year ago

@benhhack https://github.com/DAGWorks-Inc/hamilton/issues/342 is open for you, if you wanted to comment on it.

JoJo10Smith commented 1 year ago

@skrawcz I've taken a look at the other tickets and I could try the XML read and write class. The classes shouldn't be too difficult but I will reach out if I need help with the testing.

Thanks Jordan

skrawcz commented 1 year ago

@skrawcz I've taken a look at the other tickets and I could try the XML read and write class. The classes shouldn't be too difficult but I will reach out if I need help with the testing.

Thanks Jordan

@JoJo10Smith if you wanted to comment on https://github.com/DAGWorks-Inc/hamilton/issues/352 I can assign it to you. Thanks!

skrawcz commented 1 year ago

@bryangalindo mind commenting on https://github.com/DAGWorks-Inc/hamilton/issues/355 so I can assign it to you. I missed doing that earlier, sorry about that.

JoJo10Smith commented 1 year ago

@skrawcz I could take the HTML read and write class next.

Thanks Jordan

skrawcz commented 1 year ago

@skrawcz I could take the HTML read and write class next.

Thanks Jordan

@JoJo10Smith please comment on #369 :)

bryangalindo commented 1 year ago

@skrawcz i can start working on pandas gbq!

skrawcz commented 1 year ago

@skrawcz i can start working on pandas gbq!

Please comment on https://github.com/DAGWorks-Inc/hamilton/issues/375 -- note this one will require a GCP account I think.

JoJo10Smith commented 1 year ago

@skrawcz I can take on pandas Stata next.

skrawcz commented 1 year ago

@skrawcz I can take on pandas Stata next.

@JoJo10Smith please comment on #377

JoJo10Smith commented 1 year ago

@skrawcz I'll take Feather next.

Thanks Jordan

skrawcz commented 1 year ago

@skrawcz I'll take Feather next.

Thanks Jordan

https://github.com/DAGWorks-Inc/hamilton/issues/384 🙇 .

149189 commented 1 year ago

Hey skrawcz! I would like to work on the Panda Table I have already made many pandas dataframes and would like to work on this project.

flaviassantos commented 1 year ago

@skrawcz can I take 'pandas parquet'?

skrawcz commented 1 year ago

thanks @flaviassantos you should be all set. If you have questions put them in issue https://github.com/DAGWorks-Inc/hamilton/issues/406.

@149189 please comment on https://github.com/DAGWorks-Inc/hamilton/issues/407 to have me assign that to you. If have questions we can have the conversation there.

JoJo10Smith commented 1 year ago

@skrawcz I'll take csv next

Thanks Jordan

skrawcz commented 1 year ago

@JoJo10Smith https://github.com/DAGWorks-Inc/hamilton/issues/409 is up. thanks!

swapdewalkar commented 7 months ago

Created Spss for https://github.com/DAGWorks-Inc/hamilton/issues/813