Toloka / crowd-kit

Control the quality of your labeled data with the Python tools you already know.
https://crowd-kit.readthedocs.io/
Other
213 stars 16 forks source link

import crowdkit [BUG] #88

Closed ahundt closed 4 months ago

ahundt commented 1 year ago

Observed behavior


import crowdkit
# ...
    mmsr = crowdkit.aggregation.classification.m_msr.MMSR(
        n_iter=10000,
        tol=1e-10,
        n_workers=len(worker_to_id),
        n_tasks=len(st2_int),
        n_labels=2,  # Assuming binary responses
        workers_mapping=worker_to_id,
        tasks_mapping=task_to_id,
        labels_mapping=label_to_id,
    )
Exception has occurred: AttributeError
module 'crowdkit' has no attribute 'aggregation'
  File "/Users/athundt/Documents/m3c/analyze_survey_results.py", line 62, in assess_worker_responses
    mmsr = crowdkit.aggregation.classification.m_msr.MMSR(
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/athundt/Documents/m3c/analyze_survey_results.py", line 120, in statistical_analysis
    worker_skills = assess_worker_responses(binary_rank_df)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/athundt/Documents/m3c/analyze_survey_results.py", line 378, in main
    aggregated_df = statistical_analysis(combined_df, args.network_models)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/athundt/Documents/m3c/analyze_survey_results.py", line 381, in <module>
    main()
AttributeError: module 'crowdkit' has no attribute 'aggregation'

bugreport.py:

import crowdkit

def test_mmsr():
    try:
        mmsr = crowdkit.aggregation.classification.m_msr.MMSR
    except AttributeError as e:
        print(f"An error occurred: {e}")
    print('it worked!')

test_mmsr()

Expected behavior

MMSR constructor to be called.

Note this is how it is literally specified on the website, which should work if copied: https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.m_msr.MMSR/

MMSR
crowdkit.aggregation.classification.m_msr.MMSR | [Source code](https://github.com/Toloka/crowd-kit/blob/v1.2.1/crowdkit/aggregation/classification/m_msr.py#L17)

MMSR(
    self,
    n_iter: int = 10000,
    tol: float = 1e-10,
    random_state: Optional[int] = 0,
    observation_matrix: ... = _Nothing.NOTHING,
    covariation_matrix: ... = _Nothing.NOTHING,
    n_common_tasks: ... = _Nothing.NOTHING,
    n_workers: int = 0,
    n_tasks: int = 0,
    n_labels: int = 0,
    labels_mapping: Dict[Any, int] = _Nothing.NOTHING,
    workers_mapping: Dict[Any, int] = _Nothing.NOTHING,
    tasks_mapping: Dict[Any, int] = _Nothing.NOTHING
)

The following does work, but the reported bug should work too!


from crowdkit.aggregation import MMSR

def test_mmsr():
    try:
        mmsr = MMSR
    except AttributeError as e:
        print(f"An error occurred: {e}")
    print('it worked!')

test_mmsr()

Thanks for giving this a look!

Python Version

3.11

Crowd-Kit Version

1.2.1

Other Packages Versions

athundt@MacBook-Pro m3c % pip freeze aiohttp==3.8.6 aiohttp-retry==2.8.3 aiosignal==1.3.1 amqp==5.1.1 annotated-types==0.6.0 antlr4-python3-runtime==4.9.3 appdirs==1.4.4 async-timeout==4.0.3 asyncssh==2.14.0 atpublic==4.0 attrs==23.1.0 billiard==4.1.0 blinker==1.7.0 boto3==1.28.82 botocore==1.31.82 celery==5.3.4 certifi==2023.7.22 cffi==1.16.0 charset-normalizer==3.3.2 click==8.1.7 click-didyoumean==0.3.0 click-plugins==1.1.1 click-repl==0.3.0 colorama==0.4.6 configobj==5.0.8 crowd-kit==1.2.1 cryptography==41.0.5 dictdiffer==0.9.0 diskcache==5.6.3 distro==1.8.0 docopt==0.6.2 dpath==2.1.6 dulwich==0.21.6 dvc==3.28.0 dvc-data==2.20.0 dvc-http==2.30.2 dvc-objects==1.1.0 dvc-render==0.6.0 dvc-studio-client==0.15.0 dvc-task==0.3.0 dvclive==3.2.0 entrypoints==0.4 filelock==3.13.1 Flask==3.0.0 flatten-dict==0.4.2 flufl.lock==7.1.1 frozenlist==1.4.0 fsspec==2023.10.0 funcy==2.0 gitdb==4.0.11 GitPython==3.1.40 grandalf==0.8 gto==1.5.0 huggingface-hub==0.17.3 hydra-core==1.3.2 idna==3.4 iterative-telemetry==0.0.8 itsdangerous==2.1.2 Jinja2==3.1.2 jmespath==1.0.1 joblib==1.3.2 kombu==5.3.2 markdown-it-py==3.0.0 MarkupSafe==2.1.3 mdurl==0.1.2 mpmath==1.3.0 multidict==6.0.4 networkx==3.2.1 nltk==3.8.1 numpy==1.26.1 omegaconf==2.3.0 orjson==3.9.10 packaging==23.2 pandas==2.1.2 pathspec==0.11.2 pipreqs==0.4.13 platformdirs==3.11.0 prompt-toolkit==3.0.39 psutil==5.9.6 pycparser==2.21 pydantic==2.4.2 pydantic_core==2.10.1 pydot==1.4.2 pygit2==1.13.2 Pygments==2.16.1 pygtrie==2.5.0 pyparsing==3.1.1 python-dateutil==2.8.2 pytz==2023.3.post1 PyYAML==6.0.1 regex==2023.10.3 requests==2.31.0 rich==13.6.0 ruamel.yaml==0.18.5 ruamel.yaml.clib==0.2.8 s3transfer==0.7.0 safetensors==0.4.0 scikit-learn==1.3.2 scipy==1.11.3 scmrepo==1.4.1 semver==3.0.2 shortuuid==1.0.11 shtab==1.6.4 six==1.16.0 smmap==5.0.1 sqltrie==0.8.0 sympy==1.12 tabulate==0.9.0 threadpoolctl==3.2.0 tokenizers==0.14.1 tomlkit==0.12.2 torch==2.1.0 tqdm==4.66.1 transformers==4.35.0 typer==0.9.0 typing_extensions==4.8.0 tzdata==2023.3 urllib3==2.0.7 vine==5.0.0 voluptuous==0.13.1 wcwidth==0.2.9 Werkzeug==3.0.1 yarg==0.1.9 yarl==1.9.2 zc.lockfile==3.0.post1

Example code

import crowdkit

def test_mmsr():
    try:
        mmsr = crowdkit.aggregation.classification.m_msr.MMSR
    except AttributeError as e:
        print(f"An error occurred: {e}")

test_mmsr()

Relevant log output

An error occurred: module 'crowdkit' has no attribute 'aggregation'
pilot7747 commented 4 months ago

I don’t think it’s a bug. It’s intended behavior. Submodules inside aggregation exist only to organize the code for development.