Closed StephennFernandes closed 2 years ago
@hwchung27 could you please help me out here
@hwchung27 do you think this is occurring because maybe some of my samples are very short so the span corruption is masking the entire sentence ?
or is it a case where my samples have a blank line in them ?
@craffel @adarob hey guys, sorry to bother you on your busy schedule. but please could you tell me what could be the cause for this error ?
I inititally thought this error could be because my dataset has blank line samples or perhaps have samples that have only 5-6 words that the span curruption task couldn't cover. But even after filtering my dataset for keeping sentences only above 10 words. i still face the same error.
could you please help me out on this.
I don't have time to go through your code but this appears to be saying the value of 'input' is None
. I would suggest running the steps of the pipeline before tokenize in a notebook (e.g., Colab) and see if your features look as you expect.
Closing this since it's not an actual T5X bug. Feel free to reopen a Discussion topic to solicit help in debugging your issue.
raised a discussion here @adarob @hwchung27 would really mean a lot if you could help me out on it
upon running a seqio mixture on mT5 and ByT5 i get and error stating:
ValueError: None values not supported
I currently am using a seqio mixture that i define in my task.py file and use the default mt5 tokenizer
gs://t5-data/vocabs/mc4.250000.100extra/sentencepiece.model
withextra_ids=0
This is how my task.py file looks
i use the
ciil_mix_3
mixture in my .gin file this is how my .gin file looksThe following is the entire stack track of the same:
I even further tried to work the same with byT5 and the same error occurs: the following is the error occured using byT5