apache / datasketches-cpp

Core C++ Sketch Library
https://datasketches.apache.org
Apache License 2.0
223 stars 71 forks source link

[python packaging] Top-level name problems #290

Closed thatch closed 2 years ago

thatch commented 2 years ago

Hi, looking at the 3.4.0 wheel for datasketches as released, the *.dist-info/top_level.txt file contains:

datasketches
src
tests

The top-level dir "src" should not be included at all; the one file it contains is not useful. The top-level dir "tests" should be called something more unique (like "datasketches_tests") if you intend to package it in the wheel. Alternatively just keep this in the sdist?

Both of these are likely to cause conflicts with other projects.

jmalkin commented 2 years ago

Thanks for this, @thatch. We're not really python people so such suggestions are quite helpful.

We do expect to have some native python included at some point (trying to allow the use of generic python objects with container sketches, like kll or sampling, and that'll require user-specified classes for serialization or even comparators). I assume we'd want need to ensure they get included under datasketches?

thatch commented 2 years ago

I assume we'd want need to ensure they get included under datasketches?

Yep, the normal thing would be have this structure:

datasketches/__init__.py
datasketches/_datasketches.so
datasketches/kll.py

Right now you have datasketches.so and it is quite difficult to have both python and native parts of the same name. You can have the __init__.py do from ._datasketches import * so all the names are available as a way of combining them.

jmalkin commented 2 years ago

I'd experimented with the library part a bit a few months ago but we didn't have anything concrete to do with it then.

Right now we're trying to figure out why the Windows CI job started failing. Once that's done we'll be able to merge the linked PR (or maybe before if we determine it's sufficiently unrelated -- which really should be the case).

jmalkin commented 2 years ago

This is now merged, along with a few other simplifications when revisiting the files. Thanks for pointing it out!