bkamins / Julia-DataFrames-Tutorial

A tutorial on Julia DataFrames package
MIT License
531 stars 119 forks source link

add JDF.jl serialization #18

Closed xiaodaigh closed 5 years ago

bkamins commented 5 years ago

Thank you!

Can you please also review whole README.md to reflect your changes (added packages, changed package versions, entry to change list, list of explained functions)? Then I could review it as a whole.

bkamins commented 5 years ago

As a second thought - maybe better do not update the versions of other packages. Only add a new package you need.

Updating version of packages requires review of all notebooks to make sure that nothing is broken or that if some explanation should not have changed. E.g. significant changes have been introduced to CategoricalArrays.jl between the version I currently had and the version you have updated to (I have not checked yet how this should be reflected in the tutorial though).

xiaodaigh commented 5 years ago

I have tested JDF.jl on CategoricalArray 5.5 and it passed, so I've tagged a new version of JDF.jl

xiaodaigh commented 5 years ago

The other package that should be upgraded is CSV.jl to 0.5.14 which is vastly improved from 0.5.11

bkamins commented 5 years ago

Thank you for the update. My point was that we have two options:

In general I tend to update the versions of dependencies only as new releases of DataFrames.jl are out as it is a very work consuming operation to perform.

xiaodaigh commented 5 years ago
  • you should not change Manifest.toml at all except for what needs to be added to make JDF.jl run (this is I think your preferred option, as it does not require a lot of work). This means that no existing package specifications should be changed, only new entries for the missing packages (and recursively their dependencies) should be added

I think I have done this, and kept the CSV.jl at 0.5.11 on my test machine. Below is the snippet of the packages in the repo. I have made sure not to add or update any packages except by adding JDF.jl so the changes in the Manifest are due to adding of JDF.

image

bkamins commented 5 years ago

What you should look at is diff in Manifest.toml, which you can find here: https://github.com/bkamins/Julia-DataFrames-Tutorial/pull/18/files#diff-ac01a28ab92812250ed58ef8514c91ca.

You change versions of many packages that are not dependencies. Eg. you change versions of the following packages that JDF.jl does not depend on: BSON.jl, Distributions.jl, DoubleFloats.jl, FFTW.jl, FlatBuffers.jl, ForwardDiff.jl, GR.jl, GenericSVD.jl, GenericShur.jl, GeometryTypes.jl (I stopped at "G", as there is too much to list).

Anyway - maybe let us do it differently to save your time. Can you free CSV.jl to the latest version (I understand this is the only package you have pinned) and just run all the notebooks to make sure that nothing errors? If you do not have time for this - then please let me know and I will run these tests after your update.

In general - we are hitting a "dependency hell" issue here. This repo is dependency heavy (which is intentional as it should give a review of different available options), which unfortunately means that it is much less work to add something than to make sure all else still works as expected 😢.

xiaodaigh commented 5 years ago

Let me try again. I have time. I COPIED your manifest from before my updates. And then I added JDF.jl

Let me try that again and I will try to observe what actually happens. I didn't check carefully, I thought any changes would be from dependencies of JDF.jl.

I gotta go sleep first, and then continue. I try to limit my OSS contribution so I don't burn out.

bkamins commented 5 years ago

The problem is that if you add JDF.jl over my Manifest.toml all packages seem to get updated (not only the ones that are added). That is why it is a pain 😢.

And exactly in respect to your time (as I know that contributing to OSS can burn a lot of time - I have witnessed this personally 😄) I have proposed a simplified approach above.

xiaodaigh commented 5 years ago

Ok I have run all 13 sheets and checked they only produce error when it is meant to.

I have pushed all the results to a separate branch, so I don't update all the ipynb files in this, see https://github.com/xiaodaigh/Julia-DataFrames-Tutorial/tree/re-run-2019-1104

I think it's safe to merge

bkamins commented 5 years ago

Thank you for working on this!

xiaodaigh commented 5 years ago

Thank you for this wonderful tutorial. I leant a lot. Not sure if I can go to 🇵🇹 for Julia con 2020. But hope to meet up again sometime. I need t o work on group by at some point.