castorini / pyserini

Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.
http://pyserini.io/
Apache License 2.0
1.64k stars 358 forks source link

Dense retrieval: query encoder #293

Closed lintool closed 3 years ago

lintool commented 3 years ago

Currently, we use pre-encoded queries. We'll need to add the query encoder into the codebase.

MXueguang commented 3 years ago

we are going to utilize transformers in this case?

lintool commented 3 years ago

I think so... so it'll involve pulling in a new dependency... and it might clash with pygaggle, so we have to be careful in the design...

MXueguang commented 3 years ago

@lintool do you think its better to put QueryEncoder inside of SimpleDenseSearcher? i.e. as a field. so the DenseSearcher take raw text as input rather than embedding

lintool commented 3 years ago

Yes, I think the QueryEncoder should take text as input:

Thoughts?

MXueguang commented 3 years ago

sounds good, ill do the refactor after we merged hybrid feature

lintool commented 3 years ago

do you think its better to put QueryEncoder inside of SimpleDenseSearcher? i.e. as a field.

To be clear, SimpleDenseSearcher should take a QueryEncoder in its constructor or something along those lines. I assume that's what you meant by "field".

MXueguang commented 3 years ago

yep

MXueguang commented 3 years ago

I created a pytorch hgf model by copying out Student only from the tensorflow checkpoint.

4.0K    config.json
421M    pytorch_model.bin
228K    vocab.txt

Then I tried to replicate the query embedding results but got some inconsistent.

Basically, I encode the query by the code below:

import tensorflow as tf
from pyserini.dsearch import QueryEncoder
from transformers import BertTokenizer, BertModel
import numpy as np

tokenizer = BertTokenizer.from_pretrained('checkpoint')
model = BertModel.from_pretrained('checkpoint')
inputs = tokenizer("[Q] treating tension headaches without medication", max_length=36, truncation=True, return_tensors="pt")
outputs = model(**inputs)
embeddings = outputs.last_hidden_state.detach().numpy()
emb_model = np.average(embeddings[:, 4:, :], axis=-2).flatten()

compared to the pre-encoded query with this query text:

searcher = SimpleDenseSearcher.from_prebuilt_index('msmarco-passage-tct_colbert-hnsw')
result = searcher.search(emb_model)
for hit in result:
    print(f'{hit.docid} {hit.score}')

gives:

7494174 67.39310455322266
3035474 67.37532806396484
7494171 67.3394546508789
1512633 67.23931121826172
1512631 67.21798706054688
7823694 67.0272216796875
5758256 66.99702453613281
5109791 66.98359680175781
1033373 66.9701156616211
7494170 66.94258117675781

but should be:

7494174 71.77847290039062
7494170 71.55615997314453
3035474 71.55179595947266
5109791 71.44254302978516
1512633 71.31373596191406
1512635 71.2982177734375
4019845 71.25587463378906
5758256 71.24925994873047
1966060 71.21368408203125
7494171 71.1937255859375

I tried to replicate the dev set experiment with current version, it gives: MRR @10: 0.2808313435211703 which is 5 points lower than expected

I guess I missed some critical steps during encoding? Any ideas? @justram

justram commented 3 years ago

Not sure if this query [mask] trick make that difference: https://github.com/castorini/tct_colbert/blob/aa974ea1540f520d094eef5f3dd68f91f6e45f91/tfrecord_generation/convert_collection_to_tfrecord.py#L82

Could you help to check the emb value of qid: 300674, which should be the first one in queries.dev.small00.tf?

You might need somthing like the following code:

TOKENIZER = BertTokenizerFast.from_pretrained('bert-base-uncased')

def encode_line(tokenizer, line, max_len, pad_to_max_length=True, return_tensors='pt'):
    # add prefix
    line = '[Q] ' + line
    return tokenizer(
        [line],
        max_length=max_len,
        padding='max_length' if pad_to_max_length else None,
        truncation=True,
        return_tensors=return_tensors,
        add_special_tokens=True
    )

class QueryTokenizer(object):
    def __init__(self, maxlen):

        self.tok = TOKENIZER
        self.maxlen = maxlen

        self.mask_token, self.mask_token_id = self.tok.mask_token, self.tok.mask_token_id

    def encode(self, line):
        obj = encode_line(self.tok, line, self.maxlen)
        ids, mask = obj['input_ids'], obj['attention_mask']

        ids[ids == 0]   = self.mask_token_id
        mask[mask == 0] = 1
        return ids, mask
MXueguang commented 3 years ago

I added the [mask] padding, now the qid:300674 how many years did william bradford serve as governor of plymouth colony? will be tokenized as:

{'input_ids': tensor([[  101,  1031,  1053,  1033,  2129,  2116,  2086,  2106,  2520,  9999,
                          3710,  2004,  3099,  1997, 10221,  5701,  1029,   103,   103,   103,
           103,   103,   103,   103,   103,   103,   103,   103,   103,   103,
           103,   103,   103,   103,   103,   103]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])}

and I used the filter(embedding, mask, effective_num, dim, batch_size, seq_length, pooling, normalize) as filter(embeddings[:, 4:, :], inputs.attention_mask[:, 4:], 1, 768, 1, 32,'average', False)

which gives embedding:

array([-1.75190523e-01,  5.90902986e-03,  2.05224659e-02, -1.83683969e-02,
       -5.41765951e-02, -1.35883778e-01, -1.25182113e-02,  3.77857126e-02,
       -1.32575288e-01,  2.72800978e-02,  2.25816980e-01,  7.20024332e-02,
        2.15916671e-02, -8.22655633e-02,  6.97932839e-02,  1.04443870e-01,
...

Still different from the pre encoded query.

array([-2.21599162e-01, -3.42423022e-02,  1.05002429e-04, -2.97325272e-02,
       -4.77858707e-02,  1.82806794e-02,  9.37984288e-02,  5.97975031e-02,
       -1.37813792e-01,  1.61621459e-02,  1.39984101e-01,  9.60447490e-02,
       -3.02275587e-02, -5.04416153e-02,  7.75075853e-02,  7.90106803e-02,
        4.24404703e-02, -1.90475762e-01, -3.71277243e-01,  9.50178429e-02,

But the retrieval scores seem to be at the same level

4309131 72.03425598144531
2495763 71.92425537109375
7067032 71.80663299560547
2495759 71.69503784179688
4917597 71.6061019897461
3289523 71.58354187011719
4107182 71.51595306396484
5111832 71.49307250976562
2495755 71.32588195800781
7067037 71.14918518066406

v.s

4309131 72.05290222167969
2495763 71.93959045410156
3289523 71.82308959960938
2495759 71.78435516357422
7067032 71.76979064941406
5111832 71.59669494628906
4917597 71.57402801513672
4107182 71.43971252441406
2495755 71.42699432373047
4882206 71.3467025756836

(expected)

MRR@10 improved to MRR @10: 0.30415904852867504 (0.33395142584254184 expected)

jacklin64 commented 3 years ago

I guess the problem comes from token_type_ids and attention_mask. token_type_ids should have 36[0] and attention_mask should have 36[1] so that all the [mask] can be considered in attention.

MXueguang commented 3 years ago

yeah, the issue is the attention mask. Thanks @jacklin64 @justram !

It works now, not exactly same, but acceptable? @lintool

#####################
MRR @10: 0.33401674853322283
QueriesRanked: 6980
#####################
justram commented 3 years ago

Not sure where the discrepancy comes from … I also tried to replicate tf2torch by myself.

The qid: 300674 emb is still different from tfrecord My result:

array([[-0.22220942, -0.03498323,  0.00060885, -0.03020154, -0.04775231,
         0.01880883,  0.09368997,  0.0592266 , -0.13803245,  0.01656925, ...

expected

array([-2.21599162e-01, -3.42423022e-02,  1.05002429e-04, -2.97325272e-02,
       -4.77858707e-02,  1.82806794e-02,  9.37984288e-02,  5.97975031e-02,
       -1.37813792e-01,  1.61621459e-02,  1.39984101e-01,  9.60447490e-02,
       -3.02275587e-02, -5.04416153e-02,  7.75075853e-02,  7.90106803e-02,
        4.24404703e-02, -1.90475762e-01, -3.71277243e-01,  9.50178429e-02,
lintool commented 3 years ago

We should get to the bottom of this... maybe computing an L1/L2 norm between the differences is a good way to quantify the differences.

Could it be that we're using a different model?

MXueguang commented 3 years ago

brute force index with current query encoder.

#####################
MRR @10: 0.3345758288988936
QueriesRanked: 6980
#####################

I think we can fix the model with the pytorch ckpt that I am currently using? It can replicate the result on the paper.

btw: On the first query qid: 300674

np.linalg.norm(emb_model)
8.577903
np.linalg.norm(emb_expected)
8.578611
np.linalg.norm(emb_model-emb_expected)
0.014969775
lintool commented 3 years ago

We should try re-encoding the entire corpus first, and also compare.

MXueguang commented 3 years ago

yeah, sure

jacklin64 commented 3 years ago

Here is the output (2 queries and 2 passages) from tensorflow using CPU, GPU and TPU. CPU (GPU) output is more close to the pytorch results but still a little bit different. Even CPU and GPU outputs show slight difference.

Query ID 300674 CPU

array([[-2.22209334e-01, -3.49832065e-02,  6.08918490e-04,
        -3.02015804e-02, -4.77522649e-02,  1.88088529e-02,
         9.36899483e-02,  5.92265390e-02, -1.38032436e-01,
         1.65693369e-02,  1.40189260e-01,  9.59874988e-02,
        -3.03749200e-02, -5.04284017e-02,  7.77714476e-02,

GPU

array([[-2.22209364e-01, -3.49832475e-02,  6.08877279e-04,
        -3.02015394e-02, -4.77522761e-02,  1.88087765e-02,
         9.36899781e-02,  5.92266060e-02, -1.38032436e-01,
         1.65693555e-02,  1.40189305e-01,  9.59875733e-02,
        -3.03748455e-02, -5.04283682e-02,  7.77713358e-02,

TPU

array([[-2.21599162e-01, -3.42423022e-02,  1.05002429e-04,
        -2.97325272e-02, -4.77858707e-02,  1.82806794e-02,
         9.37984288e-02,  5.97975031e-02, -1.37813792e-01,
         1.61621459e-02,  1.39984101e-01,  9.60447490e-02,
        -3.02275587e-02, -5.04416153e-02,  7.75075853e-02,

########################################################### Query ID 1048585 CPU

array([[ 1.40851974e-01,  1.13944449e-01,  1.70791000e-01,
        -1.37432396e-01,  1.46705985e-01, -4.13956121e-02,
         5.36604971e-02, -1.25348002e-01,  8.20396692e-02,
         1.57915369e-01,  2.36823931e-01,  9.04484242e-02,
        -4.91077602e-02, -3.68169732e-02,  2.30795704e-05,

GPU

array([[ 1.40851870e-01,  1.13944434e-01,  1.70790941e-01,
        -1.37432396e-01,  1.46706104e-01, -4.13955264e-02,
         5.36604822e-02, -1.25348061e-01,  8.20396468e-02,
         1.57915413e-01,  2.36823887e-01,  9.04483199e-02,
        -4.91077229e-02, -3.68170477e-02,  2.31320737e-05,

TPU

array([[ 1.41459003e-01,  1.14993259e-01,  1.70063660e-01,
        -1.38316974e-01,  1.46020904e-01, -4.13241461e-02,
         5.38205728e-02, -1.25002190e-01,  8.23443234e-02,
         1.58267155e-01,  2.37596169e-01,  9.02558714e-02,
        -4.89272773e-02, -3.71671468e-02, -4.57838178e-04,

########################################################### Passage ID 0 CPU

array([[ 4.21095192e-02,  2.53889531e-01,  1.13958478e-01,
         1.21460408e-01,  9.13939923e-02,  1.35991201e-02,
         3.40743028e-02,  2.44839825e-02,  1.56429991e-01,
         6.68095145e-03, -1.28998440e-02,  1.54412776e-01,
        -1.63023472e-02,  7.20897317e-02,  4.41433676e-02,

GPU

array([[ 4.21096459e-02,  2.53889471e-01,  1.13958262e-01,
         1.21460438e-01,  9.13939178e-02,  1.35992384e-02,
         3.40742543e-02,  2.44839080e-02,  1.56429932e-01,
         6.68090675e-03, -1.29000992e-02,  1.54412627e-01,
        -1.63024031e-02,  7.20898062e-02,  4.41434346e-02,

TPU

array([[ 4.25058864e-02,  2.53913999e-01,  1.14481531e-01,
         1.21287435e-01,  9.19595137e-02,  1.41218072e-02,
         3.52010913e-02,  2.38498207e-02,  1.56589925e-01,
         7.59175606e-03, -1.20817674e-02,  1.55637801e-01,
        -1.62439942e-02,  7.24414140e-02,  4.47482839e-02,

########################################################### Passage ID 1 CPU

array([[ 4.62073162e-02,  2.48640284e-01,  2.30242908e-02,
         8.39197040e-02, -1.39881531e-02, -1.21955693e-01,
        -7.00245379e-03, -1.69559106e-01,  1.10873491e-01,
         4.93726619e-02,  1.73211680e-03,  4.15351801e-02,
         7.48274149e-03,  5.42241596e-02,  2.23560855e-02,

GPU

array([[ 4.62072790e-02,  2.48640373e-01,  2.30242610e-02,
         8.39197263e-02, -1.39881885e-02, -1.21955812e-01,
        -7.00260838e-03, -1.69559166e-01,  1.10873580e-01,
         4.93727401e-02,  1.73225382e-03,  4.15351763e-02,
         7.48270331e-03,  5.42240776e-02,  2.23561116e-02,

TPU

array([[ 4.69539650e-02,  2.49433205e-01,  2.27699503e-02,
         8.40771273e-02, -1.29647544e-02, -1.21869184e-01,
        -6.67904271e-03, -1.70053929e-01,  1.11627072e-01,
         4.99090105e-02,  2.97701266e-03,  4.21033092e-02,
         7.65242986e-03,  5.49290814e-02,  2.30459962e-02,
lintool commented 3 years ago

Just a clarification - this is all with TF? What's the output in the original TF records you have?

jacklin64 commented 3 years ago

Just a clarification - this is all with TF? What's the output in the original TF records you have?

Yes, all with tensorflow run on Colab. And the output of our original TF records is from TPU, which is the same as the the values from TPU shown above.

lintool commented 3 years ago

Okay, so even fixing TF, current finding is that TPU, GPU, CPU all give different results, but GPU and CPU are closer to each other than to TPU.

I think the next step is to compare {TF, PT} x {GPU, CPU}.

MXueguang commented 3 years ago

on my pytorch ckpt.

Query ID 300674: CPU

array([-2.31344327e-01, -2.45393813e-02, -1.39921568e-02, -5.03446907e-02,
       -2.64322311e-02,  1.95507500e-02,  7.33775198e-02,  5.17740734e-02,
       -1.66313067e-01,  3.96123007e-02,  1.24101795e-01,  9.27411541e-02,
       -2.94032749e-02, -7.32878745e-02,  5.78740276e-02,  8.69655386e-02

GPU

array([-2.31344238e-01, -2.45393701e-02, -1.39921838e-02, -5.03447540e-02,
       -2.64322013e-02,  1.95507258e-02,  7.33775571e-02,  5.17740846e-02,
       -1.66313037e-01,  3.96123119e-02,  1.24101713e-01,  9.27412659e-02,
       -2.94032507e-02, -7.32878745e-02,  5.78740202e-02,  8.69655311e-02,

Query ID 1048585: CPU:

array([ 9.30291116e-02,  9.94897932e-02,  1.62419379e-01, -1.46537602e-01,
        1.50920659e-01, -3.44233140e-02,  3.81368771e-02, -1.07739970e-01,
        7.32866675e-02,  1.34483233e-01,  1.90967619e-01,  5.98020628e-02,
       -1.63173322e-02, -2.93840822e-02,  3.49366292e-02,  6.94796219e-02,

GPU

array([ 9.30290297e-02,  9.94897485e-02,  1.62419394e-01, -1.46537513e-01,
        1.50920600e-01, -3.44232805e-02,  3.81368995e-02, -1.07739896e-01,
        7.32867122e-02,  1.34483173e-01,  1.90967634e-01,  5.98020926e-02,
       -1.63172111e-02, -2.93840617e-02,  3.49366069e-02,  6.94796443e-02,

Passage ID 0: CPU

array([ 4.49005663e-02,  2.45377406e-01,  9.64531973e-02,  1.23718321e-01,
        9.54078138e-02,  1.98404808e-02,  3.39223593e-02,  4.87163402e-02,
        1.29827216e-01, -3.19381356e-02, -7.04530701e-02,  1.53840944e-01,
       -2.06778273e-02,  8.43182430e-02,  3.50538827e-02, -2.74501517e-02,

GPU

array([ 4.49005365e-02,  2.45377421e-01,  9.64531302e-02,  1.23718493e-01,
        9.54076722e-02,  1.98404528e-02,  3.39224152e-02,  4.87164333e-02,
        1.29827201e-01, -3.19382623e-02, -7.04529658e-02,  1.53840959e-01,
       -2.06778273e-02,  8.43182504e-02,  3.50538753e-02, -2.74501573e-02,

Passage ID 1: CPU

array([ 6.55296594e-02,  2.43776590e-01, -5.95190329e-03,  8.59482586e-02,
       -1.49264690e-02, -1.30048946e-01,  4.08926886e-03, -1.10674083e-01,
        9.42917243e-02,  2.77481750e-02, -7.21220672e-02,  1.47481682e-02,
       -1.59648005e-02,  8.82957354e-02,  1.47465672e-02, -9.32754055e-02,

GPU

array([ 6.55295923e-02,  2.43776560e-01, -5.95200621e-03,  8.59481692e-02,
       -1.49264541e-02, -1.30048871e-01,  4.08925395e-03, -1.10674068e-01,
        9.42917615e-02,  2.77482625e-02, -7.21219778e-02,  1.47481235e-02,
       -1.59647409e-02,  8.82957205e-02,  1.47466967e-02, -9.32754874e-02,

Calculated the inner dot product On CPU: Query300674 dot Passage0 = 64.14206 Query300674 dot Passage1 = 64.524 Query1048585 dot Passage0 = 64.43243 Query1048585 dot Passage1 = 64.1066

On GPU: Query300674 dot Passage0 = 64.14205 Query300674 dot Passage1 = 64.523994 Query1048585 dot Passage0 = 64.43243 Query1048585 dot Passage1 = 64.1066

lintool commented 3 years ago

@MXueguang can you compute L1/L2 norm between the CPU and GPU versions? They seem pretty close...

[edit]

I've just put the PT and TF versions side by side... and the differences seem pretty big...

jacklin64 commented 3 years ago

Here is the L2 Norm from TF (CPU / GPU): Query ID 300674 8.577903 / 8.577904 Query ID 1048585 8.556613 / 8.556613 Passage ID 0 8.644435 / 8.644434 Passage ID 1 8.584829 / 8.584828

MXueguang commented 3 years ago

[EDITED]

I was adding an [SEP] to the end of query by mistake.

Now L2 Norm:

Query ID 300674
8.577903
Query ID 1048585
8.556613
Passage ID 0
8.644435
Passage ID 1
8.584829

Query 300674

array([-2.22209334e-01, -3.49831954e-02,  6.08811155e-04, -3.02015822e-02,
       -4.77522500e-02,  1.88088380e-02,  9.36900005e-02,  5.92266023e-02,
       -1.38032496e-01,  1.65692419e-02,  1.40189275e-01,  9.59874913e-02,

Passage 0:

[ 4.21095341e-02,  2.53889561e-01,  1.13958478e-01,  1.21460438e-01,
        9.13939625e-02,  1.35990903e-02,  3.40743586e-02,  2.44839918e-02,
        1.56430036e-01,  6.68090070e-03, -1.28998691e-02,  1.54412732e-01,
       -1.63023490e-02,  7.20897168e-02,  4.41434570e-02, -6.86354116e-02,
jacklin64 commented 3 years ago

ah, in tensorflow, all the input sequence length will automatically pad to fixed length ( Here I use length 154 for passage ). So for input_ids, token_type_ids and attention_mask, you can try to pad 0 for the remaining space to length 154. But, here, while average pooling, just average over the effective tokens. For example, in the passage 0, the effective token length is only [4:58].

MXueguang commented 3 years ago

ah, in tensorflow, all the input sequence length will automatically pad to fixed length ( Here I use length 154 for passage ). So for input_ids, token_type_ids and attention_mask, you can try to pad 0 for the remaining space to length 154. But, here, while average pooling, just average over the effective tokens. For example, in the passage 0, the effective token length is only [4:58].

My bad, there was a typo in the raw text of my passage text. It works now.

I updated the comment above with new result. seems with & without padding gives same result.

MXueguang commented 3 years ago

Here is the output (2 queries and 2 passages) from tensorflow v.s. pytorch using CPU, GPU and TPU. Summarized the above comments. @lintool @jacklin64 @justram tl;dr, {TF,PT}x{CPU, GPU} all have slightly different, but close enough.

Query ID 300674 CPU Tensorflow

array([[-2.22209334e-01, -3.49832065e-02,  6.08918490e-04,
        -3.02015804e-02, -4.77522649e-02,  1.88088529e-02,
         9.36899483e-02,  5.92265390e-02, -1.38032436e-01,
         1.65693369e-02,  1.40189260e-01,  9.59874988e-02,
        -3.03749200e-02, -5.04284017e-02,  7.77714476e-02,

CPU Pytorch

array([-2.22209334e-01, -3.49831954e-02,  6.08811155e-04, -3.02015822e-02,
       -4.77522500e-02,  1.88088380e-02,  9.36900005e-02,  5.92266023e-02,
       -1.38032496e-01,  1.65692419e-02,  1.40189275e-01,  9.59874913e-02,
       -3.03749200e-02, -5.04283383e-02,  7.77713582e-02,  7.87514448e-02,

GPU Tensorflow

array([[-2.22209364e-01, -3.49832475e-02,  6.08877279e-04,
        -3.02015394e-02, -4.77522761e-02,  1.88087765e-02,
         9.36899781e-02,  5.92266060e-02, -1.38032436e-01,
         1.65693555e-02,  1.40189305e-01,  9.59875733e-02,
        -3.03748455e-02, -5.04283682e-02,  7.77713358e-02,

GPU Pytorch:

array([-2.22209319e-01, -3.49832997e-02,  6.08861912e-04, -3.02015822e-02,
       -4.77523468e-02,  1.88088305e-02,  9.36898738e-02,  5.92265837e-02,
       -1.38032585e-01,  1.65693909e-02,  1.40189290e-01,  9.59875509e-02,
       -3.03747654e-02, -5.04283383e-02,  7.77713507e-02,  7.87514225e-02,

TPU Tensorflow

array([[-2.21599162e-01, -3.42423022e-02,  1.05002429e-04,
        -2.97325272e-02, -4.77858707e-02,  1.82806794e-02,
         9.37984288e-02,  5.97975031e-02, -1.37813792e-01,
         1.61621459e-02,  1.39984101e-01,  9.60447490e-02,
        -3.02275587e-02, -5.04416153e-02,  7.75075853e-02,

########################################################### Query ID 1048585 CPU Tensorflow

array([[ 1.40851974e-01,  1.13944449e-01,  1.70791000e-01,
        -1.37432396e-01,  1.46705985e-01, -4.13956121e-02,
         5.36604971e-02, -1.25348002e-01,  8.20396692e-02,
         1.57915369e-01,  2.36823931e-01,  9.04484242e-02,
        -4.91077602e-02, -3.68169732e-02,  2.30795704e-05,

CPU Pytorch

array([ 1.40851945e-01,  1.13944381e-01,  1.70790926e-01, -1.37432411e-01,
        1.46706089e-01, -4.13956121e-02,  5.36604300e-02, -1.25347987e-01,
        8.20396692e-02,  1.57915398e-01,  2.36824036e-01,  9.04483125e-02,
       -4.91077565e-02, -3.68170701e-02,  2.30876030e-05,  5.74826375e-02,

GPU Tensorflow

array([[ 1.40851870e-01,  1.13944434e-01,  1.70790941e-01,
        -1.37432396e-01,  1.46706104e-01, -4.13955264e-02,
         5.36604822e-02, -1.25348061e-01,  8.20396468e-02,
         1.57915413e-01,  2.36823887e-01,  9.04483199e-02,
        -4.91077229e-02, -3.68170477e-02,  2.31320737e-05,

GPU Pytorch

array([ 1.40852094e-01,  1.13944359e-01,  1.70790955e-01, -1.37432501e-01,
        1.46706223e-01, -4.13955823e-02,  5.36604077e-02, -1.25348091e-01,
        8.20396543e-02,  1.57915443e-01,  2.36823946e-01,  9.04483423e-02,
       -4.91077118e-02, -3.68169807e-02,  2.30657752e-05,  5.74826263e-02,

TPU Tensorflow

array([[ 1.41459003e-01,  1.14993259e-01,  1.70063660e-01,
        -1.38316974e-01,  1.46020904e-01, -4.13241461e-02,
         5.38205728e-02, -1.25002190e-01,  8.23443234e-02,
         1.58267155e-01,  2.37596169e-01,  9.02558714e-02,
        -4.89272773e-02, -3.71671468e-02, -4.57838178e-04,

########################################################### Passage ID 0 CPU Tensorflow

array([[ 4.21095192e-02,  2.53889531e-01,  1.13958478e-01,
         1.21460408e-01,  9.13939923e-02,  1.35991201e-02,
         3.40743028e-02,  2.44839825e-02,  1.56429991e-01,
         6.68095145e-03, -1.28998440e-02,  1.54412776e-01,
        -1.63023472e-02,  7.20897317e-02,  4.41433676e-02,

CPU Pytorch

array([ 4.21095341e-02,  2.53889561e-01,  1.13958478e-01,  1.21460438e-01,
        9.13939625e-02,  1.35990903e-02,  3.40743586e-02,  2.44839918e-02,
        1.56430036e-01,  6.68090070e-03, -1.28998691e-02,  1.54412732e-01,
       -1.63023490e-02,  7.20897168e-02,  4.41434570e-02, -6.86354116e-02,

GPU Tensorflow

array([[ 4.21096459e-02,  2.53889471e-01,  1.13958262e-01,
         1.21460438e-01,  9.13939178e-02,  1.35992384e-02,
         3.40742543e-02,  2.44839080e-02,  1.56429932e-01,
         6.68090675e-03, -1.29000992e-02,  1.54412627e-01,
        -1.63024031e-02,  7.20898062e-02,  4.41434346e-02,

GPU Pytorch

array([ 4.21096273e-02,  2.53889561e-01,  1.13958441e-01,  1.21460430e-01,
        9.13938582e-02,  1.35991341e-02,  3.40742506e-02,  2.44839005e-02,
        1.56430006e-01,  6.68094214e-03, -1.28998552e-02,  1.54412776e-01,
       -1.63023714e-02,  7.20897466e-02,  4.41434197e-02, -6.86353818e-02,

TPU Tensorflow

array([[ 4.25058864e-02,  2.53913999e-01,  1.14481531e-01,
         1.21287435e-01,  9.19595137e-02,  1.41218072e-02,
         3.52010913e-02,  2.38498207e-02,  1.56589925e-01,
         7.59175606e-03, -1.20817674e-02,  1.55637801e-01,
        -1.62439942e-02,  7.24414140e-02,  4.47482839e-02,

########################################################### Passage ID 1 CPU Tensorflwo

array([[ 4.62073162e-02,  2.48640284e-01,  2.30242908e-02,
         8.39197040e-02, -1.39881531e-02, -1.21955693e-01,
        -7.00245379e-03, -1.69559106e-01,  1.10873491e-01,
         4.93726619e-02,  1.73211680e-03,  4.15351801e-02,
         7.48274149e-03,  5.42241596e-02,  2.23560855e-02,

CPU Pytorch

array([ 4.62073162e-02,  2.48640284e-01,  2.30242647e-02,  8.39197934e-02,
       -1.39881019e-02, -1.21955633e-01, -7.00239837e-03, -1.69559121e-01,
        1.10873431e-01,  4.93726991e-02,  1.73210527e-03,  4.15351912e-02,
        7.48274848e-03,  5.42241931e-02,  2.23560054e-02, -1.37238979e-01,

GPU Tensorflow

array([[ 4.62072790e-02,  2.48640373e-01,  2.30242610e-02,
         8.39197263e-02, -1.39881885e-02, -1.21955812e-01,
        -7.00260838e-03, -1.69559166e-01,  1.10873580e-01,
         4.93727401e-02,  1.73225382e-03,  4.15351763e-02,
         7.48270331e-03,  5.42240776e-02,  2.23561116e-02,

GPU Pytorch

array([ 4.62072007e-02,  2.48640373e-01,  2.30243169e-02,  8.39198008e-02,
       -1.39881410e-02, -1.21955633e-01, -7.00248778e-03, -1.69559121e-01,
        1.10873431e-01,  4.93726619e-02,  1.73217733e-03,  4.15351950e-02,
        7.48273544e-03,  5.42241447e-02,  2.23560985e-02, -1.37238979e-01,

TPU Tensorflow

array([[ 4.69539650e-02,  2.49433205e-01,  2.27699503e-02,
         8.40771273e-02, -1.29647544e-02, -1.21869184e-01,
        -6.67904271e-03, -1.70053929e-01,  1.11627072e-01,
         4.99090105e-02,  2.97701266e-03,  4.21033092e-02,
         7.65242986e-03,  5.49290814e-02,  2.30459962e-02,
lintool commented 3 years ago

Closing this issue, continuing discussion in #303.