Open schaubh opened 2 months ago
Howdy @dpad , I have been able to expand the CI test actions to now cover a range of python and Linux versions, as well as Window with opNav, and macOS with opNav and one action on macOS without vizInterface. The macOS with opNav script also test builds the documentation. My next step is to try to remove the need to have the dataSupport
move to the examples
folder and not be included in the BSK build per say. I'll see if I have time to play with this idea on a branch this weekend. Do you see any issues with your build method with this approach. Or, is there an impact on upgrading our use of conan
from 1.0 to 2.0? In your other branch you already had 2.0 functionality working.
@schaubh Sorry about the delayed response, I am currently traveling.
I think there's no issue in upgrading conan to 2.0, as you say I had it working on the other branch. I think there was only some minor issues in versions and options for the dependencies that we specify in the conanfile, but I just had to pin those to an appropriate working version.
Regarding supportData
, yes, if we only need the data for examples, then they don't need to be included in the built wheel. I don't think there would be any issues with the build system, we would just need to change which files get included in the appropriate pyproject.toml
settings. The issues would only be during usage I think (e.g. what happens if a file that Basilisk expects is missing, should it give an error at run-time or initially during configuration of the simulation, should we provide a method to automatically download the data, and if so from where, and what happens if there are networking issues, etc.)
Regarding wheel size, one thing I noticed is that there's a lot of duplication in the compiled module files (I think because of the way we essentially copy-paste the messaging library code instead of linking to a shared library, for example see the auto-generated .cxx
files for messages). I mentioned before that we should use cibuildwheel
to create wheels compatible across lots of different systems at once. One of the things this does is to run auditwheel repair
on the wheel file to check and fix up the compatibility of the wheel. I realised that you can run auditwheel repair --strip
on the wheels to remove a bunch of unused symbols from the compiled module files (the .so
files) -- when I was testing this it reduced the total wheel size to less than half.
Thanks for the info. I'll looking into the SupportData
ideas over the next weeks, and good to know about the audit wheel repair --strip
suggestion. I'll try that. That might get us closer already to our target of having a BSK wheel that is less than 100Mb if possible. I'm on travel a lot over the next 3 weeks, so my productivity will be a little slower ;-)
@dpad , my test branch feature/move_support_data
has moved the supportData
folder to examples/supportData
. The test and scenario files are updated to load data files from this folder. When I build a wheel for macOS, the size has shrunk from 219Mb to 62.7Mb. As this is less than 100Mb PyPi limit, this now enables us to start looking at having builds uploaded.
I test this by:
basilisk
examples
folder and the scenarios still ran fine.Note, for the scenario scripts to find this examples/supportData
folder it does assume the parent folder is called basilisk
. If the user wants to load data from another folder they would need to create to set module data path to their own data folder.
@dpad and @sassy-asjp , thanks for your thoughts on this solution. I know moving this data folder will break scripts, but I plan to write up this issue in "known issues" document with clear guidance on how to correct this. Having wheels now by 69Mb is a huge benefit I think for the distribution of Basilisk?
Mm, looking at this now, I wonder if I made this harder than it should be. I could leave supportData
where it is in the root basilisk
folder, remove it from being included in wheels, and all current BSK scripts would still run. If someone installs BSK wheel (without support Data) in a new installation, they would have to pull the supportData
folder from the repo anyway, including custom downloading the de430.bsp
spice file that we have cmake download when building BSK.
Probably the solution is what @GorgiAstro suggested on https://github.com/AVSLab/basilisk/issues/728#issuecomment-2260617092 to have a new optional dependency repo and package just for the data.
The user could then do something like pip install git+https://github.com/AVSLab/basilisk-data.git
for the data. The functions that load the data would have to be modified to have the default path point to the data package (maybe using importlib
) instead of Basilisk proper.
Howdy @sassy-asjp , I'll look into the functionality of importlib
. Regarding having another wheel, my first concern is that this will be large (i.e. larger than 100Mb) wheel again which I was trying to avoid. As this is just a folder with data files, I could instructor users to download this folder from GitHub directly and how to access it. Or, I could write a simple Python support script that will pull this folder for the user and install in local directory, and pull the JPL large Spice file as cmake
does. Then the user just has to run this script to general a local copy of the data folder if needed.
If someone pulls the full repo they naturally get code and data folder as before.
I need to find time to learn more about importlib
.
Either way, I'm glad to see a BSK week at "only" 69Mb. Hopefully we can continue to thin this over time to make it leaner ;-)
So what Orekit did was to have people who wanted the data manually install it from a link to the repo instead of a wheel. Since it's just data, there is no building and installing from the link to the repo is fast.
I think the script could be nice as well though.
Yeah, I'm leaning toward a direct link as well. The only wrinkle is the large Spice file that is downloaded with cmake. With a python script I can download the supportData
folder and download the spice file from JPL. Might be a clean solution ;-)
I'm also working on ideas to provide backwards compatibility for users for a year by creating a symbolic link from examples/supportData
to the old folder. This would have to be done in the cmake file and be platform specific. This way users have one year to upgrade their code to point to the right data folder.
Came up with a good solution I think that doesn't create any issues with current software scripts. The wheel is created without any of the large *.bsp
files. This reduces the wheel size to around 60Mb on macOS. Much more reasonable. If the user needs to run scripts that contain the spice .bsp files, and the user scripts still want to access them from within the supportData
folder as before, the wheel now creates a command line bskLargeData
that runs a python script which will pull the large Spice .bsp files from the JPL course, and then it puts them into the local BSK package installation. I linked my branch that does this. I have test on some systems and configurations but need to do more testing.
This makes the BSK pip install much leaner. If the wheel has to be installed without internet, the instructions also talk about how to install these large *.bsp filed in the local python environment package installation. This solution seems to give us the more reasonable wheel sizes and an easy way to package up and download the larger BSK data files.
Describe your use case Right now building a wheel includes the large Spice files in
supportData
folder. These are really only needed when running some scenario scripts. Instead, I propose that this folder be moved to inside the examples folder. This should reduce greatly the BSK wheel size, making it easier to host on PyPi among other things.If users want to run the examples scripts and download this folder, they would get the required files except for the large Spice files. These would have to be manually downloaded if the user is not building BSK using cmake which would download them. I think this is manageable with good documentation and warning messages if the file is not found and can't be loaded.
Describe alternatives solutions you've considered I can't think of any alternatives right now, open to ideas.
Additional context The support data is not explicitly required by any module to compile. What data to load can be set by the user and we are providing handy defaults options. I'll have to play with a test branch to see what unexpected challenges arise from this approach.