chartbeat-labs / textacy

NLP, before and after spaCy
https://textacy.readthedocs.io
Other
2.21k stars 249 forks source link

Saving corpus with custom user tags "unhashable" list #252

Closed nyejon closed 4 years ago

nyejon commented 5 years ago

steps to reproduce

Hi, the save method of the corpus calls:

user_datas.append(doc.user_data)

With custom attributes this causes the serialization to fail as the output tries to make a tuple a dictionary key:

('._.', 'is_area_unit_of_measure', 32, None): True

'user_datas': [{'textacy': {'meta': {'offer_id': '1'}}}, {'textacy': {'meta': {'offer_id': '1'}}}, {'textacy': {'meta': {'offer_id': '2'}}}, {'textacy': {'meta': {'offer_id': '2', 'text_file_processor': 'tika'}}}, {'textacy': {'meta': {'offer_id': '3'}}}, {'textacy': {'meta': {'offerid': '3'}}}, {('..', 'which_area_unit_ofmeasure', 32, None): 'SQF', ('..', 'is_area_unit_ofmeasure', 32, None): True, ('..', 'which_area_unit_ofmeasure', 38, None): 'SQF', ('..', 'is_area_unit_of_measure', 38, None): True, 'textacy': {'meta': {'offer_id': '3'}}

bdewilde commented 5 years ago

Hi @nyejon , I think I need more information. Dictionaries can have tuples as keys; something like {(1, 2): "a", (3, 4): "b"} is perfectly valid. Could you provide a full code example to reproduce?