NVIDIA-Merlin / Transformers4Rec

Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation and works with PyTorch.
https://nvidia-merlin.github.io/Transformers4Rec/main
Apache License 2.0
1.07k stars 144 forks source link

[BUG] `cannot pickle 'mappingproxy' object` when using `TabularFeatures` `create_categorical` #727

Open denadai2 opened 1 year ago

denadai2 commented 1 year ago

Bug description

I have a bug just creating a schema programmatically. Can you help me on this?

thx

Steps/Code to reproduce bug

import merlin_standard_lib as msl
from merlin_standard_lib import Schema
from transformers4rec.torch.features.tabular import TabularFeatures

features_schema = Schema([msl.ColumnSchema.create_categorical("language", num_items=149),]
        )
a = TabularFeatures.from_schema(
        features_schema,
    )

I have TypeError: cannot pickle 'mappingproxy' object

coming from

│ /home/mdenadai/miniconda3/envs/gnn/lib/python3.9/site-packages/transformers4rec/torch/features/t │
│ abular.py:175 in from_schema                                                                     │
│                                                                                                  │
│   172 │   │   │   │   │   **kwargs,                                                              │
│   173 │   │   │   │   )                                                                          │
│   174 │   │   │   else:                                                                          │
│ ❱ 175 │   │   │   │   maybe_continuous_module = cls.CONTINUOUS_MODULE_CLASS.from_schema(         │
│   176 │   │   │   │   │   schema, tags=continuous_tags, **kwargs                                 │
│   177 │   │   │   │   )                                                                          │
│   178 │   │   if categorical_tags:                                                               │
│                                                                                                  │
│ /home/mdenadai/miniconda3/envs/gnn/lib/python3.9/site-packages/transformers4rec/torch/tabular/ba │
│ se.py:190 in from_schema                                                                         │
│                                                                                                  │
│   187 │   │   -------                                                                            │
│   188 │   │   Optional[TabularModule]                                                            │
│   189 │   │   """                                                                                │
│ ❱ 190 │   │   schema_copy = deepcopy(schema)                                                     │
│   191 │   │   if tags:                                                                           │
│   192 │   │   │   schema_copy = schema_copy.select_by_tag(tags)

This happens even when I just do:

import deepcopy
import merlin_standard_lib as msl
from merlin_standard_lib import Schema
from transformers4rec.torch.features.tabular import TabularFeatures

deepcopy(Schema([msl.ColumnSchema.create_categorical("language", num_items=149),]))

Environment details

denadai2 commented 1 year ago

It seems that if I removeint_domain from ColumnSchema everthing can be copied

class ColumnSchema(Feature):
    @classmethod
    def create_categorical(
        cls,
        name: str,
        num_items: int,
        shape: Optional[Union[Tuple[int, ...], List[int]]] = None,
        value_count: Optional[Union[ValueCount, ValueCountList]] = None,
        min_index: int = 0,
        tags: Optional[TagsType] = None,
        **kwargs,
    ) -> "ColumnSchema":
        _tags: List[str] = [t.value for t in TagSet(tags or [])]

        extra = _parse_shape_and_value_count(shape, value_count)
        int_domain = IntDomain(name=name, min=min_index, max=num_items, is_categorical=True)
        _tags = list(set(_tags + [Tags.CATEGORICAL.value]))
        extra["type"] = FeatureType.INT

        return cls(name=name, int_domain=int_domain, **extra, **kwargs).with_tags(_tags)
denadai2 commented 1 year ago

and it gets solved with betterproto 2.0, maybe because of https://github.com/danielgtaylor/python-betterproto/pull/339. However, this creates a dependency clash

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. merlin-core 23.6.0 requires betterproto<2.0.0, but you have betterproto 2.0.0b6 which is incompatible.

EvenOldridge commented 10 months ago

Thanks for the detailed bug report and the fix.

You can try updating the dependencies in requirements.txt; there's a reasonable chance that it'll work. We're unfortunately not able to update our containers at this time but if you can test that it's working we'd love a PR with your solution.