Hello! I have been following your tutorial, and, although I got it to work with my data, I was forced to previously filter the number of features, otherwise the memory would skyrocket. I have a dataset with a little over 1000 samples, but over 50k features. I can only get the dataset to run, on an instance with 124 Gb of RAM, when I pick only the top 1000 features with the most variance. Is there a different way to run this using all the data? If not, is there a more appropriate way to run this in batches?
Hello! I have been following your tutorial, and, although I got it to work with my data, I was forced to previously filter the number of features, otherwise the memory would skyrocket. I have a dataset with a little over 1000 samples, but over 50k features. I can only get the dataset to run, on an instance with 124 Gb of RAM, when I pick only the top 1000 features with the most variance. Is there a different way to run this using all the data? If not, is there a more appropriate way to run this in batches?
Thank you for your time!