Closed jeromemassot closed 3 years ago
Hi Jerome!
I understand that you are talking about the Tapas Tokenizer in huggingface. Does it make sense do raise the issue there?
I am closing this issue but feel free to reopen if I am missing something.
Indeed, sorry for the confusion. Thanks Thomas
Hi Google Research team,
Very very very strange (at least for me, a Computer Science newbie) with the function when the table ingested has been resampled with the pd.DataFrame.sample() method.
In the following block of code, the rows iterator returns corrupted rows with my table. I have check the iterrows() outside the Tapas Tokenizer and the rows returned are correct. But inside the Tokenizer, the rows are sometimes ok but sometimes Cell objects and corresponding to wrong rows !!
The direct result in my case is a crash in the
normalize_for_match()
method : AttributeError: 'Cell' object has no attribute 'lower' which is normal since several rows in the table now are of Cell type and not str.I cannot see why the rows iterator suddenly returns corrupted data, for both Type and Values.
Thanks
Best regards
Jerome