I am currently working with longer documents compared to the shorter ones the current version is tailored towards. Currently, the code pads all sequences to the length of the longest one, which in my case can be quite long.
Is there some best practise on how to deal with this? I assume the best way is to first build sequences out of documents myself, but how long should they be, and should they overlap or stand on its own?
I am currently working with longer documents compared to the shorter ones the current version is tailored towards. Currently, the code pads all sequences to the length of the longest one, which in my case can be quite long.
Is there some best practise on how to deal with this? I assume the best way is to first build sequences out of documents myself, but how long should they be, and should they overlap or stand on its own?
Looking forward to any pointers. Thanks.