cdqa-suite / cdQA

⛔ [NOT MAINTAINED] An End-To-End Closed Domain Question Answering System.
https://cdqa-suite.github.io/cdQA-website/
Apache License 2.0
616 stars 191 forks source link

Try BERT prediction on sample file #33

Closed fmikaelian closed 5 years ago

fmikaelian commented 5 years ago

Ideas: https://github.com/fmikaelian/cdQA/issues/7#issuecomment-464351904

fmikaelian commented 5 years ago

The function read_squad_examples() takes a json file as input. This json is converted to a list of dicts with SQuAD format with json.load(reader)["data"].

We could therefore do the following at prediction time:

fmikaelian commented 5 years ago

Actually, the converter we built does something similar to this.

fmikaelian commented 5 years ago
import uuid

def generate_squad_examples(question, article_indices, metadata):

    squad_examples = []

    metadata_sliced = metadata.loc[article_indices]

    for index, row in tqdm(metadata_sliced.iterrows()):
        temp = {'title': row['title'],
               'paragraphs': []}

        for paragraph in row['paragraphs']:
            temp['paragraphs'].append({'context': paragraph,
                                      'qas': [],
                                      'question': question,
                                      'id': str(uuid.uuid1())})

            squad_examples.append(temp)

    return squad_examples

Then we can call this in the example:

squad_examples = generate_squad_examples(question='Who is the creator of Artificial Intelligence?',
                               article_indices=article_indices,
                               metadata=df)

Outputs:

[{'title': 'Artificial Intelligence: more revolutionary than the Internet!',
  'paragraphs': [{'context': 'BNP Paribas launches the prototype AGORA, first online community for corporate clients',
    'qas': [],
    'question': 'Who is the creator of Artificial Intelligence?',
    'id': 'bec64330-3b40-11e9-8dad-0242ac110012'},
   {'context': 'Artificial Intelligence has progressed at lightning speed in recent years. Machines are now able to beat humans in Go matches, understand natural language, reason and learn. As a result, software and robots have something to offer in every field to make business more productive, profitable and innovative. Chronicle of a revolution foretold.',
    'qas': [],
    'question': 'Who is the creator of Artificial Intelligence?',
    'id': 'bec6701c-3b40-11e9-8dad-0242ac110012'},
   {'context': 'Artificial Intelligence refers to a set of technologies – machine learning, deep learning, language processing, etc. – that share one common feature in that they rely on a computer system capable of analyzing, understanding, learning and discovering connections between things, facts and events as well as manipulating concepts. It should come as no surprise that machines have acquired these extraordinary abilities. Just like flying cars, autonomous and hyper-intelligent humanoid robots have been a major part of science fiction for decades.',
    'qas': [],
    'question': 'Who is the creator of Artificial Intelligence?',
    'id': 'bec67102-3b40-11e9-8dad-0242ac110012'},
   {'context': '“Artificial Intelligence is a word that has been around for 60 years, but which ultimately refers to nothing more than software. Machines are very good at performing repetitive tasks and can help humans work more efficiently. But they cannot take their own initiatives and can only make progress by interacting with people”, explains Edouard d’Archimbaud, manager of the Data Science & Artificial Intelligence Lab at BNP Paribas CIB. ',
    'qas': [],
    'question': 'Who is the creator of Artificial Intelligence?',
    'id': 'bec67238-3b40-11e9-8dad-0242ac110012'}],
 {'title': 'Sugiyama to lead Japan in France Fed Cup clash (AFP)',
  'paragraphs': [{'context': 'Machine learning, deep learning, artificial intelligence—Julien Dinh, Senior Research Lead at...',
    'qas': [],
    'question': 'Who is the creator of Artificial Intelligence?',
    'id': 'bec68e6c-3b40-11e9-8dad-0242ac110012'}]}]
fmikaelian commented 5 years ago

Question: Who is the creator of Artificial Intelligence?

Predictions returned by predictions = model.predict(X=(test_examples, test_features)) are:

(OrderedDict([('2398202a-41b4-11e9-beaa-796013f1ec43', 'BNP Paribas'),
              ('239828b8-41b4-11e9-beaa-796013f1ec43',
               'Chronicle of a revolution'),
              ('2398294e-41b4-11e9-beaa-796013f1ec43',
               'machine learning, deep learning, language processing, etc.'),
              ('23983056-41b4-11e9-beaa-796013f1ec43', 'Edouard d’Archimbaud'),
              ('2398309c-41b4-11e9-beaa-796013f1ec43', 'AI'),
              ('239830e2-41b4-11e9-beaa-796013f1ec43', 'BNP Paribas'),
              ('23983128-41b4-11e9-beaa-796013f1ec43', 'Marvin Lee Minsky'),
              ('23983164-41b4-11e9-beaa-796013f1ec43',
               'Artificial Intelligence is in fact likely to surpass humans in performing tasks that require reasoning and learning.'),
              ('239831a0-41b4-11e9-beaa-796013f1ec43', 'Watson'),
              ('239831e6-41b4-11e9-beaa-796013f1ec43', 'Google'),
              ('2398322c-41b4-11e9-beaa-796013f1ec43', 'Accenture'),
              ('23983268-41b4-11e9-beaa-796013f1ec43', 'AI'),
              ('239832a4-41b4-11e9-beaa-796013f1ec43', 'Partnership on AI'),
              ('239832e0-41b4-11e9-beaa-796013f1ec43', 'BNP Paribas'),
              ('23983326-41b4-11e9-beaa-796013f1ec43', 'Edouard d’Archimbaud'),
              ('23983362-41b4-11e9-beaa-796013f1ec43', 'data scientists'),
              ('2398339e-41b4-11e9-beaa-796013f1ec43', 'Edouard d’Archimbaud'),
              ('239833e4-41b4-11e9-beaa-796013f1ec43',
               'AI system’s ability to learn “by example” or “by experience”.'),
              ('23983420-41b4-11e9-beaa-796013f1ec43',
               'Deep learning is a learning technology that uses artificial neural networks, which approximate human learning to process “raw data”.'),
              ('2398345c-41b4-11e9-beaa-796013f1ec43', 'Alan Turing'),
              ('23983498-41b4-11e9-beaa-796013f1ec43', 'TEDxParis'),
              ('239834d4-41b4-11e9-beaa-796013f1ec43', 'BNP Paribas'),
              ('23983510-41b4-11e9-beaa-796013f1ec43', 'BNP Paribas'),
              ('23983a60-41b4-11e9-beaa-796013f1ec43', 'change management'),
              ('23983ad8-41b4-11e9-beaa-796013f1ec43', 'BNP Paribas'),
              ('23983b1e-41b4-11e9-beaa-796013f1ec43', 'Julien Dinh'),
              ('23983f92-41b4-11e9-beaa-796013f1ec43', 'Julien Dinh')]),
 OrderedDict(),
 OrderedDict())

The ground truth is Marvin Lee Minsky, available in context 23983128-41b4-11e9-beaa-796013f1ec43:

{'context': 'One of the creators of Artificial Intelligence, Marvin Lee Minsky, notably defines it as “the construction of computer programs that engage in tasks that are, for now, more satisfactorily accomplished by humans because they require high-level mental processes”. ',
    'qas': [{'answers': [],
      'question': 'Who is the creator of Artificial Intelligence?',
      'id': '23983128-41b4-11e9-beaa-796013f1ec43'}]},