Closed StefanKennedy closed 5 years ago
Remember the reason the 320 limit is there is because BERT has a character limit of 320 characters, not words. However, this seems to be words?
That is actually news to me. There's no BERT on this dataset anyway, this limit is more a limit of CNNs
Since some of our models require a max review length due to memory constraints, this PR adds a function to the
feature_extraction.py
script that allows us to specify the max sequence length of the dataset we read. We should obtain benchmarks using the same dataset, so we should filter out the same reviews even for those experiments that do not have memory constraints