AVSLab / basilisk

Astrodynamics simulation framework
https://hanspeterschaub.info/basilisk
ISC License
147 stars 61 forks source link

remove need for `supportData` folder in BSK wheels #814

Open schaubh opened 2 months ago

schaubh commented 2 months ago

Describe your use case Right now building a wheel includes the large Spice files in supportData folder. These are really only needed when running some scenario scripts. Instead, I propose that this folder be moved to inside the examples folder. This should reduce greatly the BSK wheel size, making it easier to host on PyPi among other things.

If users want to run the examples scripts and download this folder, they would get the required files except for the large Spice files. These would have to be manually downloaded if the user is not building BSK using cmake which would download them. I think this is manageable with good documentation and warning messages if the file is not found and can't be loaded.

Describe alternatives solutions you've considered I can't think of any alternatives right now, open to ideas.

Additional context The support data is not explicitly required by any module to compile. What data to load can be set by the user and we are providing handy defaults options. I'll have to play with a test branch to see what unexpected challenges arise from this approach.

schaubh commented 2 months ago

Howdy @dpad , I have been able to expand the CI test actions to now cover a range of python and Linux versions, as well as Window with opNav, and macOS with opNav and one action on macOS without vizInterface. The macOS with opNav script also test builds the documentation. My next step is to try to remove the need to have the dataSupport move to the examples folder and not be included in the BSK build per say. I'll see if I have time to play with this idea on a branch this weekend. Do you see any issues with your build method with this approach. Or, is there an impact on upgrading our use of conan from 1.0 to 2.0? In your other branch you already had 2.0 functionality working.

dpad commented 2 months ago

@schaubh Sorry about the delayed response, I am currently traveling.

I think there's no issue in upgrading conan to 2.0, as you say I had it working on the other branch. I think there was only some minor issues in versions and options for the dependencies that we specify in the conanfile, but I just had to pin those to an appropriate working version.

Regarding supportData, yes, if we only need the data for examples, then they don't need to be included in the built wheel. I don't think there would be any issues with the build system, we would just need to change which files get included in the appropriate pyproject.toml settings. The issues would only be during usage I think (e.g. what happens if a file that Basilisk expects is missing, should it give an error at run-time or initially during configuration of the simulation, should we provide a method to automatically download the data, and if so from where, and what happens if there are networking issues, etc.)

Regarding wheel size, one thing I noticed is that there's a lot of duplication in the compiled module files (I think because of the way we essentially copy-paste the messaging library code instead of linking to a shared library, for example see the auto-generated .cxx files for messages). I mentioned before that we should use cibuildwheel to create wheels compatible across lots of different systems at once. One of the things this does is to run auditwheel repair on the wheel file to check and fix up the compatibility of the wheel. I realised that you can run auditwheel repair --strip on the wheels to remove a bunch of unused symbols from the compiled module files (the .so files) -- when I was testing this it reduced the total wheel size to less than half.

schaubh commented 2 months ago

Thanks for the info. I'll looking into the SupportData ideas over the next weeks, and good to know about the audit wheel repair --strip suggestion. I'll try that. That might get us closer already to our target of having a BSK wheel that is less than 100Mb if possible. I'm on travel a lot over the next 3 weeks, so my productivity will be a little slower ;-)

schaubh commented 2 weeks ago

@dpad , my test branch feature/move_support_data has moved the supportData folder to examples/supportData. The test and scenario files are updated to load data files from this folder. When I build a wheel for macOS, the size has shrunk from 219Mb to 62.7Mb. As this is less than 100Mb PyPi limit, this now enables us to start looking at having builds uploaded.

I test this by:

  1. copying the wheel file to a new folder called basilisk
  2. created a new virtual environment with venv
  3. did a pip install of this wheel
  4. copied over the examples folder and the scenarios still ran fine.

Note, for the scenario scripts to find this examples/supportData folder it does assume the parent folder is called basilisk. If the user wants to load data from another folder they would need to create to set module data path to their own data folder.

@dpad and @sassy-asjp , thanks for your thoughts on this solution. I know moving this data folder will break scripts, but I plan to write up this issue in "known issues" document with clear guidance on how to correct this. Having wheels now by 69Mb is a huge benefit I think for the distribution of Basilisk?

schaubh commented 2 weeks ago

Mm, looking at this now, I wonder if I made this harder than it should be. I could leave supportData where it is in the root basilisk folder, remove it from being included in wheels, and all current BSK scripts would still run. If someone installs BSK wheel (without support Data) in a new installation, they would have to pull the supportData folder from the repo anyway, including custom downloading the de430.bsp spice file that we have cmake download when building BSK.

sassy-asjp commented 2 weeks ago

Probably the solution is what @GorgiAstro suggested on https://github.com/AVSLab/basilisk/issues/728#issuecomment-2260617092 to have a new optional dependency repo and package just for the data.

The user could then do something like pip install git+https://github.com/AVSLab/basilisk-data.git for the data. The functions that load the data would have to be modified to have the default path point to the data package (maybe using importlib) instead of Basilisk proper.

schaubh commented 2 weeks ago

Howdy @sassy-asjp , I'll look into the functionality of importlib. Regarding having another wheel, my first concern is that this will be large (i.e. larger than 100Mb) wheel again which I was trying to avoid. As this is just a folder with data files, I could instructor users to download this folder from GitHub directly and how to access it. Or, I could write a simple Python support script that will pull this folder for the user and install in local directory, and pull the JPL large Spice file as cmake does. Then the user just has to run this script to general a local copy of the data folder if needed.

If someone pulls the full repo they naturally get code and data folder as before.

I need to find time to learn more about importlib.

schaubh commented 2 weeks ago

Either way, I'm glad to see a BSK week at "only" 69Mb. Hopefully we can continue to thin this over time to make it leaner ;-)

sassy-asjp commented 2 weeks ago

So what Orekit did was to have people who wanted the data manually install it from a link to the repo instead of a wheel. Since it's just data, there is no building and installing from the link to the repo is fast.

I think the script could be nice as well though.

schaubh commented 1 week ago

Yeah, I'm leaning toward a direct link as well. The only wrinkle is the large Spice file that is downloaded with cmake. With a python script I can download the supportData folder and download the spice file from JPL. Might be a clean solution ;-)

I'm also working on ideas to provide backwards compatibility for users for a year by creating a symbolic link from examples/supportData to the old folder. This would have to be done in the cmake file and be platform specific. This way users have one year to upgrade their code to point to the right data folder.

schaubh commented 3 days ago

Came up with a good solution I think that doesn't create any issues with current software scripts. The wheel is created without any of the large *.bsp files. This reduces the wheel size to around 60Mb on macOS. Much more reasonable. If the user needs to run scripts that contain the spice .bsp files, and the user scripts still want to access them from within the supportData folder as before, the wheel now creates a command line bskLargeData that runs a python script which will pull the large Spice .bsp files from the JPL course, and then it puts them into the local BSK package installation. I linked my branch that does this. I have test on some systems and configurations but need to do more testing.

This makes the BSK pip install much leaner. If the wheel has to be installed without internet, the instructions also talk about how to install these large *.bsp filed in the local python environment package installation. This solution seems to give us the more reasonable wheel sizes and an easy way to package up and download the larger BSK data files.