RobotLocomotion / drake

Model-based design and verification for robotics.
https://drake.mit.edu
Other
3.25k stars 1.25k forks source link

pip install: Investigate wheel size #15628

Closed BetsyMcPhail closed 3 years ago

BetsyMcPhail commented 3 years ago

The current drake wheel is about 164M wheel. Based on initial reading of docs, pypi requires < 100M (compressed size).

There are three avenues we could take to solve this:

  1. Prune some examples/models from the wheel. Will need to ask appropriate developers what is required or could be removed.
  2. Ask pypi for a quota increase
  3. Create a separate wheel for data

Working towards #1183

Link to Slack conversation

BetsyMcPhail commented 3 years ago

Request from @jwnimmer-tri via Slack :

@mwoehlke-kitware could put together a summary of what each portion of the whl approximately costs us as measured by the compressed size, e.g., libdrake.so, any other shared libraries, examples models, manipulation models, license texts, etc.

BetsyMcPhail commented 3 years ago

From meeting on 8/19

The first task is to audit the size of individual components to create an inventory of compressed size of subdirectories within examples/.

We should also consider that some models are only used at runtime. It may be possible to include only those for the first release.

jwnimmer-tri commented 3 years ago

Per slack, we found that dropping the *.png texture files from the YCB models and the Atlas model was sufficient to bring us under the 100 MiB limit for now.

I think it's fine to ship our initial release without those. Not all of our users' code will work, but the files are so astoundingly big that there is no reasonable way we can include them in pip, even as a separate whl. If we need them in pip land, we'll need to fetch them on demand from the interwebs.


All Drake issues must have an assignee, so assigning this one to +@mwoehlke-kitware.

jwnimmer-tri commented 3 years ago

Hmm, can't assign to @mwoehlke-kitware yet. For now I'll assign @BetsyMcPhail to everything. It's fine to re-assign later once we have @mwoehlke-kitware permissions set up properly.

mwoehlke-kitware commented 3 years ago

Once we sort out how to better ship models/textures, we should possibly consider applying that to all such content. There's at least one .stl that's another ~8M compressed and I think we can gain another ~55M (down to ~45M) if we can trim "everything".

The best way to see what's "most offensive" is to use unzip -lv the.whl. To see individual files, pipe through sort -n -k3. To see how many compressed bytes a particular directory uses, use unzip -lv the.whl some/path/* | tail -n1.

mwoehlke-kitware commented 3 years ago

Here are some insights (all sizes in compressed bytes):

44,903,050 examples/
80,780,930 manipulation/models/

 7,802,513 manipulation/models/ycb/meshes/010_potted_meat_can_textured.png
 7,874,735 manipulation/models/realsense2_description/meshes/d415.stl
 8,002,103 manipulation/models/ycb/meshes/009_gelatin_box_textured.png
 8,781,061 examples/atlas/urdf/materials/textures/extremities_diffuse.png
 8,804,154 manipulation/models/ycb/meshes/005_tomato_soup_can_textured.png
 8,884,405 manipulation/models/ycb/meshes/006_mustard_bottle_textured.png
 9,640,102 manipulation/models/ycb/meshes/003_cracker_box_textured.png
10,916,269 manipulation/models/ycb/meshes/004_sugar_box_textured.png
14,085,111 lib/libdrake.so

manipulation/models is literally almost half the wheel. examples is another quarter.

jwnimmer-tri commented 3 years ago

From the meeting notes last wee:

Haven't handled size problem… been nuking big textures to get the size under control - this will probably cause problems with Russ’s class

Betsy to ask for more space so we can include textures for now until we come up with a better solution for those non-code things.

Please don't do this yet (i.e., don't ping the pypi maintainers). Russ is not using pip for his class, so there is no urgency.

The most likely outcome in my mind is that we are not going to ship the texture files in the whl, rather we would fetch them on-demand from the internet if required, either as part of the setup while installing the whl, or even at runtime when the model is first used. The file sizes are so unreasonable (and will grow larger over time as we add more models), that I don't think shipping them in the whl will be sustainable.

BetsyMcPhail commented 3 years ago

If/when we are ready to request more storage space, link to create an issue is here: https://github.com/pypa/pypi-support/issues/new/choose

jwnimmer-tri commented 3 years ago

I looked at drake-0.32.0a4-cp37-cp37m-manylinux_2_27_x86_64.whl from pypi.

Please remove these files from the wheel, they are not supposed to be in there in the first place:

 9365432  Defl:N  3099304  67% 2021-09-07 14:32 32f2472c  pydrake/examples/kuka_iiwa_arm/kuka_plan_runner
10535368  Defl:N  3469987  67% 2021-09-07 14:32 130ca690  pydrake/examples/kuka_iiwa_arm/kuka_simulation

Please also remove all of pydrake/examples/atlas/**.

I believe that will bring us back from the brink. Once we have all of the recent PRs merged, we can spin a new whl and see what the size is looking like.

mwoehlke-kitware commented 3 years ago

they are not supposed to be in there in the first place

...because they are compiled things that don't make sense in the wheel, I'm guessing? I take it these should never be in the initial wheel, then?

I believe that will bring us back from the brink.

Did you mean to do only this, or in addition to the other things we've already been nuking? Only those bits gets us to ~130M. Those plus what we've been nuking gets us to a much "safer" ~80M.

jwnimmer-tri commented 3 years ago

...because they are compiled things that don't make sense in the wheel, I'm guessing? I take it these should never be in the initial wheel, then?

Yes, exactly.

Did you mean to do only this, or in addition to the other things we've already been nuking? Only those bits gets us to ~130M. Those plus what we've been nuking gets us to a much "safer" ~80M.

These in addition to.

BetsyMcPhail commented 3 years ago

At ~80M we're safely under the 100M limit. That should be good enough for now. We can open a new issue with a long-term plan to remove common platform-independent data, if needed.