ctlllll / SGConv

164 stars 20 forks source link

2d filters #2

Open alexkreimer opened 2 years ago

alexkreimer commented 2 years ago

Hey, nice work!

I was wondering. It seems that in image tasks you convert the features to 1d and then apply the filter. Would it be possible to create 2d filters using the same idea? Did you try that?

ctlllll commented 2 years ago

Hi Alex,

Thanks for your interest! It is a good idea to extend the SGConv to 2d filters, but we didn't try that because we focused on long sequence modeling in the paper, and using 2d filters violates the goal of the long-range benchmarks. I think you can definitely try to use the idea here on standard 2d filters to make better vision models.

Tylersuard commented 2 years ago

@ctlllll What is the longest sequence you are able to model using your method? Great work btw!

leeyeehoo commented 2 years ago

@Tylersuard Hi Tyler, in the experiment we are able to test on Long Range Arena which has the Pathfinder-X with 128*128 flatten images and Speech Command dataset with 16000 as the sequence length.

Tylersuard commented 2 years ago

Great! I think the global conv filter is a brilliant idea. Hypothetically, what is the maximum sequence length you could do? Like would 200k (Enformer) or even 500m be possible?

leeyeehoo commented 2 years ago

@Tylersuard Hi Tyler, we are unable to give the specific "maximum" length that SGConv can process. Usually if the task can be trained on the server with the enough GPU memory it's possible to have a try. Also thank you for informing the Enformer paper that I didn't notice before :)

Tylersuard commented 2 years ago

You are welcome! To me, the most exciting part about this is its ability to take really long input sequences with little additional compute cost. I tried to run your repo last night on a premium gpu using my custom dataset of many 200k character sequences, but it looks like you haven’t released the code. If you give me access, I will run some experiments and try to find the maximum input length and report my results to you.

ctlllll commented 2 years ago

@Tylersuard Hi Tyler, we just pushed a standalone SGConv code, and you can have a try now! We tried to run on sequence with 1M tokens with model dimension 256, and it cost ~20G GPU memory per layer; I think it will work well in your case :) Thanks for pushing us to make things more approachable, got too lazy to clean the code before...

Tylersuard commented 2 years ago

@ctlllll Huzzah! Thank you very much.

Tylersuard commented 2 years ago

This might be a noob question, but how would I go about using this to generate text? Or maybe to solve one of the long document problems

leeyeehoo commented 2 years ago

Hi Tyler, our language modeling experiments are based on the repository: https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/LanguageModeling/Transformer-XL You can refer to it for more information. It will give you the general idea of how to use transformer-style models to do language modeling.

Tylersuard commented 2 years ago

@ctlllll I was able to do 2 million tokens with model dimension 256 :) I am trying to get ahold of an 80gb A100 to push it even further. 2022-11-07 (4)