Closed Xtremely-Doped-Punk closed 1 year ago
I got that error too. Any helps?
yeah this happens when you are using hdfs2k or hdfs100k dataset and, as the in-script size of test is large in this proj that was taken as test size for the original entire 11gb of hdfs data, and u have manually change to a lesser value, i have completed re-written those scripts and it works now for me.
Can I refer to the code you rewrote?
under bert_pytorch/dataset/sample.py
file, under this function definition: def generate_train_valid(...)
:
u need to check for the value of min-len paramater passed into it, u can changed that in training confiruation itself, if u still get the same problem, try this before that line of error in sample.py
script:
while True:
logkey_seq_pairs,time_seq_pairs = try_generate_seqences(...) # same as it is script
if len(logkey_seq_pairs) < 10 or len(time_seq_pairs) < 10: # initially given config is 10
min_len = int(min_len/2)
else:
break
u could a print statement in between to see what's going on where..
Thank you so much
python logbert.py train
produces this errorOutput:
and after investigating a bit into the packages, there seems to a problem with assigning the test_size in sample.py in bert_pytorch/dataset/...
test size parameter needs to be given as float value between 0 and 1, but this assigns the no.of samples in test size itself as argument to train_test_split()
part of code that causes this prob (marked with ## symbol nearby)
I think this should solve the problem (i don't know the exact work of the package, but I think this this could be a minor fix and just wanted to make sure its correct...):