Open LoganALJones opened 7 months ago
Is training with 1024 or 2048 sequence length feasible using this method?
Is training with 1024 or 2048 sequence length feasible using this method?