TidierOrg / Tidier.jl

Meta-package for data analysis in Julia, modeled after the R tidyverse.
MIT License
524 stars 14 forks source link

Too much recompilation when `using Tidier` #135

Closed camilogarciabotero closed 7 months ago

camilogarciabotero commented 7 months ago

Since this is a meta-package I know it reexports all the other packages using Reexport.jl. This is cool, but it could take more than 50 % of the loading time in recompilation!

@time using Tidier

  9.802069 seconds (4.13 M allocations: 270.918 MiB, 9.41% gc time, 14.49% compilation time: 54% of which was recompilation)

This concerns all the reexported packages exporting DataFrames and similar common packages. I don't know any solution to that, but maybe having Reexport.jl more intelligently reexport packages...

camilogarciabotero commented 7 months ago

There might be tedious solution: reexport general tools like DataFrames, CSV and others that are commonly used in the Tidier ecosystem, and then reexport the specific public exports from each Tidier library?

kdpsingh commented 7 months ago

I'm open to ideas. In general, Tidier is intended to come batteries included and meant only for interactive use.

Folks with time-sensitive workflows should probably import the specific sub-packages they need individually. Importantly, the base packages all need to also work out of the box, which is why there's lots of re-exporting of portions of DataFrames, etc.

But if there's a way to make the situation better, happy to look.

kdpsingh commented 7 months ago

There might be tedious solution: reexport general tools like DataFrames, CSV and others that are commonly used in the Tidier ecosystem, and then reexport the specific public exports from each Tidier library?

Hmmm, let's put this in our back pocket and reconsider this once the ecosystem is more mature and we have a larger user base. Tidier itself right now is quite low-maintenance, and I think there's a huge advantage to that bc it lets us innovate faster on the base packages.

rdboyes commented 7 months ago

10 seconds is acceptable for the "batteries included" version, for now, I think. At least for TidierPlots, I'm still changing the public exports quite a lot as I figure out better ways to organize things and add ggplot features - I'd rather just keep that up to date inside the package rather than having to do it twice

kdpsingh commented 7 months ago

Let's close this issue for now. Will revisit once the base packages mature more.

camilogarciabotero commented 7 months ago

10 seconds is acceptable for the "batteries included" version

I agree, maybe when reaching a more stable and mature this could be concern.