anhaidgroup / deepmatcher

Python package for performing Entity and Text Matching using Deep Learning.
BSD 3-Clause "New" or "Revised" License
555 stars 129 forks source link

Datasets #30

Open zhengyang-wang opened 5 years ago

zhengyang-wang commented 5 years ago

The train/test/validation.csv downloaded from https://github.com/anhaidgroup/deepmatcher/blob/master/Datasets.md are different from the csv files under examples/sample_data.

Also, it was not mentioned where the ids in the train/test/validation.csv under examples/sample_data come.

Could you kindly clarify these?

jweckschmied commented 4 years ago

I have the same problem, it seems the train/test/valid datasets have been reduced to just include the tuple ids and not the tuples themselves and thus need to be joined with tableA and tableB to actually get the correct format. Maybe there exists a predefined Magellan function for this? Or do we need to manually do this?

sidharthms commented 4 years ago

Please see this issue: https://github.com/anhaidgroup/deepmatcher/issues/51

It has links to 2 colab notebooks that have code showing how to join data for two kinds of datasets.

On Wed, Jul 1, 2020 at 2:45 AM Jonas Weckschmied notifications@github.com wrote:

I have the same problem, it seems the train/test/valid datasets have been reduced to just include the tuple ids and not the tuples themselves and thus need to be joined with tableA and tableB to actually get the correct format. Maybe there exists a predefined Magellan function for this? Or do we need to manually do this?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/anhaidgroup/deepmatcher/issues/30#issuecomment-652314145, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABWE5ZK36IIJRMFHQ6CCBZLRZMASJANCNFSM4HPTOQTQ .

Ricca-xie commented 3 years ago

I have same problem about text dataset which only has id and label. Do I need to manually process the tableA and tableB ?