Closed PonteIneptique closed 5 years ago
Hi. I am sure this is because of an empty field in your data. Probably a lemma has 0 length. Make sure this is not the case (all lines having same number of fields)
Yup, sorry for this. Indeed some of the data were empty. Somehow this was not caught by previous versions or I was running it badly before.
For some reasons, some of my files were corrupted at some point. Sorry for not seeing that before opening the issue.
So, I digged a little further, the LineParser seems to issue an empty sentence ([], {'lemma': [], 'pos': [], 'morph': []})
which I am currently actively trying to track down in my data (but which I am currently failing to do...)
So my regex failed and I had actually double empty lines in someplaces.
I wonder if you'd be interested to warn people in this kind of situation, telling them line Z is screwing up if one sentence is empty ? I did that doing
# break
if not line:
if len(parser.inp) == 0:
print("Line {} is breaking everything".format(line_num))
yield parser.inp, parser.tasks
parser.reset()
continue
which was pretty useful. Some sanity check here would be probably cool (like yield only if len(parser.inp)
:) ) You know, for people having poorly formatted data... ;)
try removing the first empty line. I should add check on line breaks whether there sentence is empty
On Wed, 21 Nov 2018, 14:09 Thibault Clérice, notifications@github.com wrote:
So, I digged a little further, the LineParser seems to issue an empty sentence ([], {'lemma': [], 'pos': [], 'morph': []}) which I am currently actively trying to track down in my data (but which I am currently failing to do...)
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/mikekestemont/pie/issues/7#issuecomment-440656617, or mute the thread https://github.com/notifications/unsubscribe-auth/AF6HoxsbylykOALKH9ukWsT5NJf9eLu4ks5uxVCTgaJpZM4Yszh8 .
try removing the first empty line. I should add check on line breaks whether there sentence is empty
In French : "Les grands esprits se rencontrent" (which is a pretty pedantic thing to say now that I think about it ;) )
Hey there, when I train with the latest on my old dataset, I run quite quickly into this issue :
I have had the same issue with morph and reduced as much as possible the tasks to make sure the issues was not in my attempt at configuring other tasks. Let me know if you need anything else.