anhaidgroup / deepmatcher

Python package for performing Entity and Text Matching using Deep Learning.
BSD 3-Clause "New" or "Revised" License
566 stars 130 forks source link

Value error #24

Open staniPetrox opened 5 years ago

staniPetrox commented 5 years ago

I get a value error, tried to uninstall and install several packages but nothing worked :( I prepared the sets as they are supposed to be. Including the 'left' and right prefixes, id, label and so on


train, validation, test = dm.data.process(
    path='/home/censored/quora-question-pairs',
    train='train.csv',
    validation='validation.csv',
    test='test.csv')

including the Error

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-14-989ecb797b84> in <module>()
      4     train='train.csv',
      5     validation='validation.csv',
----> 6     test='test.csv')

/home/censored/.local/lib/python3.6/site-packages/deepmatcher/data/process.py in process(path, train, validation, test, unlabeled, cache, check_cached_data, auto_rebuild_cache, tokenize, lowercase, embeddings, embeddings_cache_path, ignore_columns, include_lengths, id_attr, label_attr, left_prefix, right_prefix, use_magellan_convention, pca)
    195 
    196     _maybe_download_nltk_data()
--> 197     _check_header(header, id_attr, left_prefix, right_prefix, label_attr, ignore_columns)
    198     fields = _make_fields(header, id_attr, label_attr, ignore_columns, lowercase,
    199                           tokenize, include_lengths)

/home/censored/.local/lib/python3.6/site-packages/deepmatcher/data/process.py in _check_header(header, id_attr, left_prefix, right_prefix, label_attr, ignore_columns)
     32         if attr not in (id_attr, label_attr) and attr not in ignore_columns:
     33             if not attr.startswith(left_prefix) and not attr.startswith(right_prefix):
---> 34                 raise ValueError('Attribute ' + attr + ' is not a left or a right table '
     35                                  'column, not a label or id and is not ignored. Not sure '
     36                                  'what it is...')

ValueError: Attribute  is not a left or a right table column, not a label or id and is not ignored. Not sure what it is...

How can I avoid this?

belerico commented 5 years ago

You have something in your dataset header. It's not specified, not that i'm aware of, but the data must be comma separated. Maybe i'll work something out so one can specify which separator deepmatcher has to use

jakubriegel commented 3 years ago

Hi, I've encountered same problem. You may have empty column name in your csv. For me, the case was to explicitly disable index in saving DataFrame to csv:

 data.to_csv(name, index=False)