-
Feature request for the Gradient Low-Rank Projection (GaLore) optimizer.
The GaLore optimizer computes low-rank gradients way to dramatically reduce memory. The ArXiv paper is [here](https://arxiv.…
-
**TODO**
**Lee - SmoothManifolds Book : Ch 1 to 5**
https://math.berkeley.edu/~jchaidez/materials/reu/lee_smooth_manifolds.pdf
https://drive.google.com/file/d/1nbJWFy5zzx0dQ1QCjO--dADladWO0yIL/vie…
-
Hey, I really like your work on rank-collapse, and I am trying to understand the way the Dirichlet energy and rank_diff are calculated for each layer for each GNN.
From looking at the code, I under…
-
There are several tabular MDP algorithms such as MBIE-EB, E3, and Q-learning with bonus. We currently only support the latter. It will be great to include more. These could be useful as black boxes fo…
-
Example from IEEE RAL: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9677911
Idea: iteratively solve 2 SDPs:
1) Minimize with fixed objective
2) Reshape objective to get lower rank solutio…
-
Currently the `C` matrix is allocated as dense and updated in-place using rank-1 updates. This is fine when the data is not large (see #8) or when the rank is close to the size of the data, but otherw…
-
indexer(NT) has --low an --high parameters:
``
--low= skip entries from the anagram file shorter than 'low' characters. (default = 5)
--high= skip entries from the anagram file longer than 'hi…
-
is there any way to calculate these two functions
thanks for your help.
-
- https://arxiv.org/abs/2106.09685
- 2021
自然言語処理のパラダイムは、一般的なドメインデータを用いて大規模な事前学習を行い、特定のタスクやドメインに適応させるというものが主流です。
しかし、大規模な事前学習を行うと、モデルの全パラメータを再学習する従来の微調整が困難になります。
GPT-3 175Bを例にとると、175B個のパラメータを持つ微調…
e4exp updated
3 years ago
-
Currently rank_feature query on `rank_features` field type supports only 3 functions: log, sigmoid and saturation.
Consider adding additional functions of `cosineSimilarity` and `dotProduct` only …