Open real-yfprojects opened 2 years ago
The tool is just very conservative. Source builds, when they actually build, will always work. Wheel binaries depend on host packages which are unknown and may not work.
This has been discussed elsewhere but never done. A flag to simply say --use-wheels-for=cryptography
where you can override this would be reasonable.
This would be awesome for installing packages like numpy, scikit-learn or pandas as binary wheels since these are difficult to build+install from source packages (because of their many compile time dependencies).
What would the output of flatpak-pip-generator look like? In case the wheels are platform-specific, I guess the output would have to specify multiple wheels, one for each target architecture, right?
How could this output be generated? The flatpak-pip-generator script currently calls pip to resolve packages. It could be possible to resolve packages for a specific target architecture by passing the --platform
parameter to pip.
What would the output of flatpak-pip-generator look like? In case the wheels are platform-specific, I guess the output would have to specify multiple wheels, one for each target architecture, right?
You simply add all wheels needed to sources
and specify their platform using only_arches
.
How could this output be generated? The flatpak-pip-generator script currently calls pip to resolve packages. It could be possible to resolve packages for a specific target architecture by passing the
--platform
parameter to pip.
It is also easily possible to query the pypi for a list of wheels available for a given package. In fact this is already implemented in the script.
What would the output of flatpak-pip-generator look like? In case the wheels are platform-specific, I guess the output would have to specify multiple wheels, one for each target architecture, right?
You simply add all wheels needed to
sources
and specify their platform usingonly_arches
.
sounds good! I'm trying to think this through and make it as specific as possible. can you provide an example of how you would structure the flatpak json? a separate module for each platform-specific wheel, right?
It is also easily possible to query the pypi for a list of wheels available for a given package. In fact this is already implemented in the script.
yes, querying pypi is the easy part... resolving the right wheel filename to install is the hard part. for this, the flatpak-pip-generator script currently relies on the pip download
command to choose the right filename.
querying is easy: for example, if I were to call flatpak-pip-generator scikit-learn
, then flatpak-pip-generator's get_pypi_url
method would query https://pypi.org/pypi/scikit-learn/json and fetch a lot of information. Click the link and see for yourself, the json structure lists all files for all releases.
choosing the right filename for the "best" wheel to install is hard. For example, the json structure for scikit-learn's current 1.1.1 release includes 17 files. The filenames have PEP425 semantics, i.e., each filename specifies what python version, abi and platform it is been made for. How would we choose the "best" wheel to install amongst so many options?
How then would we solve the hard part?, i.e., how can we choose the right filename, the best wheel to install?
pip download
to choose the "best" wheel for any given constraints. That is what flatpak-pip-generator
is doing already, see https://github.com/flatpak/flatpak-builder-tools/blob/a86a1577f75c79072678f825dcbf8f8da1ff36c5/pip/flatpak-pip-generator#L228. So I guess we would have to pass --platform
arguments to pip download in order to download wheels for a specific platform. And then we would have to skip the step where flatpak-pip-generator
downloads source packages instead of wheels.sounds good! I'm trying to think this through and make it as specific as possible. can you provide an example of how you would structure the flatpak json? a separate module for each platform-specific wheel, right?
This is how I did it. Right now pip will choose the correct wheel when running the build process.
Some of the wheels might not be needed for any build. In fact only two of them should be needed since the flatpak build server only runs on two platforms. The x-checker-data
field is not used since the bot for that can't check platform dependent wheels afaik.
{
"name": "python3-secretstorage",
"buildsystem": "simple",
"build-commands": [
"pip3 install --verbose --exists-action=i --no-index --find-links=\"file://${PWD}\" --prefix=${FLATPAK_DEST} \"secretstorage\" --no-build-isolation"
],
"sources": [
{
"type": "file",
"url": "https://files.pythonhosted.org/packages/79/b2/78bd6b9705296a8030c398619c9dedaa0724199be800955a7c18a1e6a3ba/scikit_learn-1.1.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
"sha256": "33cf061ed0b79d647a3e4c3f6c52c412172836718a7cd4d11c1318d083300133",
"only-arches": ["aarch64"]
},
{
"type": "file",
"url": "https://files.pythonhosted.org/packages/43/bc/7130ffd49a1cf72659c61eb94d8f037bc5502c94866f407c0219d929e758/scikit_learn-1.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"sha256": "47464c110eaa9ed9d1fe108cb403510878c3d3a40f110618d2a19b2190a3e35c",
"only-arches": ["x86_64"]
},
{
"type": "file",
"url": "https://files.pythonhosted.org/packages/58/be/06987c1268a5c6beea0fea7b3c25eb52839fa23693ab2f92b80721d78554/scikit_learn-1.1.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
"sha256": "e851f8874398dcd50d1e174e810e9331563d189356e945b3271c0e19ee6f4d6f",
"only-arches": ["aarch64"]
},
{
"type": "file",
"url": "https://files.pythonhosted.org/packages/72/7d/cbcad2588a4baf1661e43005a9c35a955ab38e247a943715d90a7c96e6b3/scikit_learn-1.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"sha256": "b928869072366dc138762fe0929e7dc88413f8a469aebc6a64adc10a9226180c",
"only-arches": ["x86_64"]
},
{
"type": "file",
"url": "https://files.pythonhosted.org/packages/21/f1/08f5e313c028bfce28abc068ba5b6633ed95b767441b6e5271249ae65601/scikit_learn-1.1.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
"sha256": "8ff56d07b9507fbe07ca0f4e5c8f3e171f74a429f998da03e308166251316b34",
"only-arches": ["aarch64"]
},
{
"type": "file",
"url": "https://files.pythonhosted.org/packages/62/cb/49d4c9d3505b0dd062f49c4f573995977876cc556c658caffcfcd9043ea8/scikit_learn-1.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"sha256": "c2dad2bfc502344b869d4a3f4aa7271b2a5f4fe41f7328f404844c51612e2c58",
"only-arches": ["x86_64"]
}
]
}
How then would we solve the hard part?, i.e., how can we choose the right filename, the best wheel to install?
I think Option B is the way to go since that only adds code for passing the options --platform
, --python-version
, --implementation
and --abi
. However we still need to know what to pass for these options since the script should return the (almost) same results on every machine.
This is how I did it.
great, thank you! so basically you downloaded a list of all wheels, filtered the list for the linux platform with desired architecture, and added the wheels as sources to the manifest. later on, during the build, pip install
will pick whatever wheel it considers to be best from the available sources. that's clever, clean and easy, I like this solution!
However we still need to know what to pass for these options
draft:
cp3*
or none
manylinux*
or any
I continued thinking and researching about how to filter the list of wheel filenames (from pypi json), so we only include roughly suitable candidates as sources in the build manifest:
macos
or win
. should be enough to get started.find_matching_wheels
function. It boils down to comparing two list of tags. In our case, the first list of tags would come from packaging.tags.sys_tags
; this is a list of tags that are compatible with the current system, in preferential order. The second list of tags would come from parsing a wheel's filename from the pypi .json using packaging.tags.parse_tag
; this is the list of tags that the wheel has been built for. * quick and dirty filtering using regexes could be used to blacklist strings such as `macos` or `win`. should be enough to get started.
A regex like win|macos
?
In our case, the first list of tags would come from
packaging.tags.sys_tags
; this is a list of tags that are compatible with the current system, in preferential order.
Why should we use the current system version. Flatpaks are usually build on the buildbot.flathub.org server which probably uses a different version. Or maybe the build process uses a python version bundled in the flatpak's Runtime.
The second list of tags would come from parsing a wheel's filename from the pypi .json using
packaging.tags.parse_tag
;
I think packaging.utils.parse_wheel_filename(filename)
would be the right function for that.
A regex like
win|macos
?
yes
Why should we use the current system version.
you are right, it cannot always be used.
--runtime
parameter when calling flatpak-pip-generator, then the current system's python version can be used as fallback--runtime
argument, then we should find out which python version is installed within this runtime. This can be achieved by executing a command like flatpak run --user org.gnome.Platform//42 -c 'python3 --version'
....I edited my above posting to better reflect this.
I think packaging.utils.parse_wheel_filename(filename) would be the right function for that.
yes, even better, thank you! (I had overlooked this function)
Pseudocode draft
# in the flatpak-pip-generator-script,
# for all packages specified in the "--use-wheels-for" commandline argument
1. the script retrieves a package's .json description by fetching it from pypi,
by querying, e.g., https://pypi.org/pypi/scikit-learn/json
2. within the package .json, the script looks up the right release (with the right package version number)
3. within the release .json, the script retrieves the list of filenames
4. the script filters the list of filenames using these criteria:
a. discard the filename if it is not a wheel, e.g., by matching the filename against `\.whl$`
b. discard the filename if none of its PEP425 tags can be found in the list of tags that are acceptable for the target runtimes.
5. the filtered list of wheel filenames is included in the output of flatpak-pip-generator,
as source files for the package.
(Note: I wonder if the above logic based on pypi .json instead of downloaded wheels could also be used in the main logic of the flatpak-pip-generator script. The script currently downloads binary wheels, only to replace them with source packages later on... we could skip the initial download. But this would maybe take things too far, and make it difficult for maintainers to accept the pull request, if the change is too big. Better start small).
@TingPing Why does the flatpak-pip-generator
script use pip to determine the packages needed? Is the sole reason resolving dependencies?
# in the flatpak-pip-generator-script, # for all packages specified in the "--use-wheels-for" commandline argument
Maybe --force-wheels-for
since the script already uses wheels if they are platform independent.
- the script retrieves a package's .json description by fetching it from pypi, by querying, e.g., https://pypi.org/pypi/scikit-learn/json
Before that it should download all the packages and dependencies using pip. Then it determines which package versions are needed (and whether the packages are required in the first place).
I think for these changes it would also make sense to move more code into functions in the script and add a if __name__ == '__main__':
section.
@TingPing Why does the
flatpak-pip-generator
script use pip to determine the packages needed? Is the sole reason resolving dependencies?
Can you answer this @TingPing or ping anyone that was involved in coding the flatpak-pip-generator
?
@johannesjh I started working on this but I don't find the code and some of the choices made easy to understand. Especially the duplication of calculations bothers me.
Yes me too. I am torn between patching what exists vs. attempting a lean rewrite.
A) Patching what exists: I guess this would mean to keep using pip download
because it offers dependency resolution, an existing users probably rely on it. But we can avoid multiple calls to pip download; these are not necessary in my opinion. It would be great if one of the maintainers could confirm... but in my opinion:The second call to pip download, where the script says it is downloading source packages, can be omitted and merged with the first call. How can this be done? - pip supports commandline options (and requirements.txt files support these options as well), allowing to specify a preference for binary vs source packages.
B) Attempting a lean rewrite: in a rewritten script, we could omit dependency resolution because other tools like pip freeze
, pip-compile
or pip-env
are better at it. In other words, the script should expect a complete dependency tree with frozen package versions as input. This would reduce the script to the collection of download urls for one or multiple target platforms, and to the formatting of json/yaml output.
Attempting a lean rewrite: in a rewritten script, we could omit dependency resolution because other tools like pip freeze, pip-compile or pip-env are better at it.
We still can use some of the code in the current code as a reference, e.g. for handling vcs dependencies. AFAIK pip freeze
can only create a requirements file from a python environment. It isn't able to resolve dependencies. The same goes for pip-compile
.
pipenv graph
can give us a nice dependency tree. However they use pipdeptree in the background which we can use directly to reduce dependencies. The problem with these approaches is that one always needs to create a complete virtual python environment when the script should be run. Ideally the script should be as short as possible, have as little dependencies as possible and be as fast as possible.
We could use pip's dependency resolver which is a very low level requirement for our script.
The implementation of pip download
can serve as a reference:
Since the pip API is well written, this is a valid approach. I also found pipgrip which is the kind of tool we need. Though there are some known caveats. They link resolvelib which pip uses a fork of. If resolvelib serves our needs we should look whether we can use the resolvelib incorporated into pip.
I fully agree that the script should be kept small and simple. Hence the idea for a lean rewrite... and that's why I am saying, we could outscope dependency resolution if this makes the script simpler. To explain again, there are many solutions for how users may want to resolve and freeze their dependencies. Some of these solutions also install the packages or require them to be installed by the user (e.g., pip freeze and pip-env), others don't (e.g., pip-compile), we really don't care: in a lean rewrite, we could take a fully resolved dependency tree with frozen package versions as input. The flatpak-poetry-generator script shows how short and simple the script could be in this approach.
(Alternatively, if you feel that we should keep dependency resolution as a feature of the script... then yes, we could use a package for that instead of calling pip download)
(Alternatively, if you feel that we should keep dependency resolution as a feature of the script... then yes, we could use a package for that instead of calling pip download)
I get your point. The script should expect a pinned requirements file (with the dependencies also) then. Still you should consider adding an extra script for getting such a requirements file since many people still either maintain no requirements file at all or one with the top-level dependencies only.
Alternatively we could document the easiest way for obtaining such a pinned requirements file in the README.
I created some code as a prototype for replacing the usage of pip download
in situations where the flatpak-pip-generator
script does not actually need to download a package, but only needs to find out the right download url and hash to write into the manifest.
The prototype allows to write, for example:
def main():
# define the target platform(s)
gnome42_x86 = PythonInterpreter(3, 9, Architecture.x86_64)
# define requirements (could be read from requirements.txt or other formats)
pandas = Release("pandas", "1.4.4")
# select the best wheel from pypi, for a given target architecture:
wheel = pandas.wheel_for(gnome42_x86)
# alternatively, select the source package
sdist = pandas.sdist()
# we now have all the data needed to write a flatpak manifest
pprint(wheel)
pprint(sdist)
...which will print:
Download(filename='pandas-1.4.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl',
url='https://files.pythonhosted.org/packages/91/2e/f2e84148e71dda670b310f1f7b9a220181e5dc1fe2f9dcf6a8632412bf4e/pandas-1.4.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl',
sha256='a981cfabf51c318a562deb4ae7deec594c07aee7cf18b4594a92c23718ec8275')
Download(filename='pandas-1.4.4.tar.gz',
url='https://files.pythonhosted.org/packages/1a/3f/bba4f9e41fff332415cdb08063b78a53c813aba1ac02887944657bb30911/pandas-1.4.4.tar.gz',
sha256='ab6c0d738617b675183e5f28db32b5148b694ad9bba0a40c3ea26d96b431db67')
...implemented in less than 200 lines of code.
using just one dependency, pypa's packaging
package, that is also used internally in pip.
note about implementation choices: I could alternatively have used pip's internal apis to implement the same thing, i.e., using the implementation of pip download
as a reference, as suggested in the discussion above. but pip's developers strongly discourage using pips internal apis. I followed their recommendation. And I used apis of pypa's packaging
package instead.
next steps:
I uploaded the full code in this gist: https://gist.github.com/johannesjh/2da0ffdc5458fd46b6c32dc7e84e4d30
@real-yfprojects what do you think about it?
Very nice although I wouldn't have used OOP for a script since the well defined API introduces quite some unused code.
I extended the prototype, published as a new version of the gist at https://gist.github.com/johannesjh/2da0ffdc5458fd46b6c32dc7e84e4d30. It can now parse requirements.txt files, and it produces flatpak build manifests as output.
About the OOP programming style: The script has now grown to over 400 lines... and yes, the object-oriented programming style makes it rather verbose. Hopefully also well-structured and easier to maintain, because: When I started writing the script, I originally also wanted a more straightforward procedural style, but things got complex so I wanted strictly typed data structures, so I started using dataclasses and continued to structure the program in a more object-oriented way, that's how it came to be.
About unused code: can you be more specific? If there is unused code, we can of course remove it.
example 1, simple commandline invocation:
python3 req2flatpak.py pandas==1.4.4 -t 39-linux-x86_64 39-linux-aarch64
example 2, invocation from a pyhon script... this opens up many possibilities for customization if needed:
from req2flatpak import FlatpakGenerator, PythonInterpreter, Arch, RequirementsTxtParser, PyPi
if __name__ == "__main__":
# example demonstrating how to invoke req2flatpak from a python script:
gnome42_x86 = PythonInterpreter(major=3,minor=9,arch=Arch.x86_64)
gnome42_aarch64 = PythonInterpreter(major=3,minor=9,arch=Arch.aarch64)
generator = FlatpakGenerator(interpreters=[gnome42_x86, gnome42_aarch64])
reqs = RequirementsTxtParser.parse_string("""
pandas == 1.4.4
""")
output = generator.buildmodule_as_json(reqs)
print(output)
the examples produce the following output:
{
"name": "python3-package-installation",
"buildsystem": "simple",
"build-commands": [
"pip3 install --verbose --exists-action=i --no-index --find-links=\"file://${PWD}\" --prefix=${FLATPAK_DEST} --no-build-isolation pandas"
],
"sources": [
{
"type": "file",
"url": "https://files.pythonhosted.org/packages/91/2e/f2e84148e71dda670b310f1f7b9a220181e5dc1fe2f9dcf6a8632412bf4e/pandas-1.4.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"sha256": "a981cfabf51c318a562deb4ae7deec594c07aee7cf18b4594a92c23718ec8275",
"only-arches": [
"x86_64"
]
},
{
"type": "file",
"url": "https://files.pythonhosted.org/packages/3f/ea/c80181902a2c9c15f796a0c729ca730052c5d95bfdc3689ad477e15f75d1/pandas-1.4.4-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
"sha256": "9d2a7a3c1fea668d56bd91edbd5f2732e0af8feb9d2bf8d9bfacb2dea5fa9536",
"only-arches": [
"aarch64"
]
}
]
}
what is still missing are heuristics/filters/rules for customizing the choice of download packages. for example:
flatpak-pip-generator
has this featureMy fear is that such needs will blow up the script's complexity if we start building commandline options for each and every customization need. I think it will be better to ask users to write their own python code. Default usage would not be difficult, as demonstrated in the above python code example, and advanced users could add their own customizations, e.g. in order to filter packages, to implement additional package indices, etc. So I guess it makes sense to review the script's programming API from this perspective.
i found a bug. the logic in my implementation of _linux_platforms
is flawed... the functions from packaging.tags that I am calling rely on code that needs to run on the target machine, e.g., code that determines glibc versions. this has rather large implications about how to get a correct list of supported platform tags:
I think it makes sense to publish this as a package on pypi. The code could then be split up into multiple files so that one could still download the core functionality as a script. However this might not be the right repository for maintaining such a package.
Btw. the OOP approach still needs 20 lines less than the current script.
About unused code: can you be more specific? If there is unused code, we can of course remove it.
Actually there is only little. Release.sdist
and the two __eq__
methods aren't used.
or else users of the script would have to provide the list of targeted platform tags as data when running the script. that would be rather tedious.
It would be sufficient if the user provides the musl version(s) of the target platform(s). A maximum glibc version is only needed for generating a (finite) list of compatible versions.
Big news, I published the req2flatpak script as a new project on github.
Some notes on current progress and next steps:
Code structure... I spent some time refactoring the script. I think that the code structure is now easier to understand. The data classes are mostly pure data, almost no functionality. The behavior is implemented a procedural way... but I still use classes to group related methods together. Programmatic use of the script now boils down to:
platforms = [PlatformFactory.from_string("310-x86_64")]
requirements = RequirementsParser.parse_file("requirements.txt")
releases = PypiClient.get_releases(requirements)
downloads = {
DownloadChooser.wheel_or_sdist(release, platform)
for release in releases
for platform in platforms
}
manifest = FlatpakGenerator.manifest(requirements, downloads)
...the above code shows how easy and straightforward it is to programmatically use the req2flatpak script, with the benefit that each of these steps can be customized if needed. A basic commandline (CLI) interface is still provided in the req2flatpak script, to make it easy to get started.
About the list of platform tags that we discussed in the above two comments: This is resolved now. I wrote a method PlatformFactory.from_python_version_and_arch(...)
to generate a list musl linux platform tags. The method is independent from the current python interpreter and system; this is its main advantage over packaging.tags.sys_tags
. The tags returned by this method are an approximation, trying to match what packaging.tags.sys_tags would return if invoked on a musl linux system with cpython. The approximation worked really well: The method returns the exact same tags as if running packaging.tags.sys_tags on org.gnome.Platform//43, on both x86_64 and aarch64 architectures.
An update on my first practical experience with the new req2flatpak
script: I personally started using req2flatpak in a first project, in favagtk. This was successful: The req2flatpak script reads a requirements.txt file with over 60 python packages and generates a flatpak build module. Some of the packages like scikit-learn and numpy are notoriously difficult to install from sdist (e.g., when using flatpak-pip-generator), but req2flatpak chooses suitable wheels. This makes the package installation easy and much faster.
Retrospective and next steps: What started as a feature request and prototype has turned into a separate project that I named req2flatpak. So I guess this is a moment for saying thank you and good bye. Thank you @real-yfprojects for your constructive help and feedback. And thank you to the maintainers of flatpak-builder-tools for the prior work, and for hosting this conversation up until now. As for the future, let's stay friends! I am explicitly saying this because sometimes maintainers don't like to see diverging projects or forks... should this be the case, feel free to contact, I am open to many ways of cooperation, including the possibility of contributing req2flatpak into the popular flatpak-builder-tools project, to the benefit of many.
Great @johannesjh! I will do a complete code review soon.
Still running into this in 2024 it seems. Had a transitive dep on Pydantic2, which is a very popular library now. Unfortunately the source dist requires a rust toolchain. Having to manually edit in the appropriate wheel is not ideal.
flatpak-builder version
1.0.10
Linux distribution and version
Ubuntu 20.04
Affected flatpak-builder tool
pip/flatpak-pip-generator
flatpak-builder tool cli args
No response
Source repository URL
No response
Flatpak-builder manifest URL
No response
Description
When generating a dependency file for the python package
cryptography
from pypi.org, theflatpak-pip-generator
recognised that the package only supplies platform dependent wheel distributions and therefore includes the source distribution . However the source distribution can only be build by compiling complicated rust dependencies and building the app with flatpak-builder fails therefore. I solved this by replacing the source distribution with all the different wheel packages available so that the right one is always available. Why can'tflatpak-pip-generator
do something similar?PS: Which wheel distributions do I actually need for building the flatpak on aarch64 and x86?