Closed zhiyil1230 closed 1 year ago
Some assorted thought on GOLI after a few weeks developing.
One thing that in my opinion would be useful, is for the user to understand what is the design space of all possible models that can be designed with GOLI. Specifically how the config file defines the design space. I think a good way to achieve this is through a diagram that visually explains this. Showing how a list of input tensors, is processed by the pre_nn, pre_nn_edges, pe_encoders, gnn, post-nn, task heads and metrics. This will help make clear the connections between the previous modules.
Random thoughts:
FCLayer
over plain nn.Linear
?)pip install -e .
and living inside itI'm removing the ticket from this sprint, but we can leave it open to collect more feedback in the future
General documentation #181
FullGraphMultiTaskNetwork
, as a drawing #247 Usability
For the other comments:
Having configs only in YAML means you don't have IDE tooling when you edit them, so it would be nice if it was possible to have a Python object which could be used for configuration
You can pass a dictionary, you just have to create your own main
file
Baking general good practice like norms and dropout into each layer is a great idea but makes it harder to customise some things (for example, you can't choose between "pre-" and "post-" layer norm)
I don't think it's necessary to have normalization before a layer. In the case of the MLP
and FeedForwardNN
, there's the first_normalization
option.
I'm not convinced there's an advantage to using FCLayer over plain nn.Linear?
FCLayer
has a lot more options, norm, dropout, activation. In some cases, it is used without any of these, and replicates theLinear
.
The reason it is used is that it defaults to the mup
initialization instead of the pytorch one.
some work needs to be done to convert it to a "library" in the sense of it being a set of components you can use on its own
This is already done internally. We have a pipeline in-place to package it and we will do it once we reach a stable version
You can pass a dictionary, you just have to create your own
main
file
The problem for me has more to do with discoverability of the features of the library. It's easier to work out what you need to provide for e.g. a dataclass
than a dictionary that needs to have some particular keys.
Some work needs to be done to convert it to a "library" in the sense of it being a set of components you can use on its own
Where is this pipeline? Can we try it out and try to find out what might be missing?
I don't think it's necessary to have normalization before a layer. In the case of the
MLP
andFeedForwardNN
, there's the first_normalization option.
This is the specific paper I was thinking of, but I'll admit I haven't read it in enough detail to know if its ideas are useful in this context: https://arxiv.org/abs/2002.04745
Answers to @callumm-graphcore
The problem for me has more to do with discoverability of the features of the library. It's easier to work out what you need to provide for e.g. a dataclass than a dictionary that needs to have some particular keys.
I understand your point. But there are many advantages to a YAML / dictionary.
lightning.Trainer
, with the yaml you can simply use any feature available within the Trainer. If you use a dataclass
, you will have to do a signature check of such classes.Of course, it would be great if we can transition to dataclass at some point to have IDE suggestions, but it would require lots of work.
Where is this pipeline? Can we try it out and try to find out what might be missing?
@hadim perhaps you can answer this one? He's asking for the pipeline to package Goli into a library on pip / conda.
For the normalization before the layer, This is the specific paper I was thinking of, but I'll admit I haven't read it in enough detail to know if its ideas are useful in this context: https://arxiv.org/abs/2002.04745
Since this paper is specific to the Transformer layer, then this change only needs to happen inside GPS. But again, we're using a hybrid architecture, so it makes our model way less sensitive to the Transformer stability.
As long as the repo is pip-compatible, it should be fine for conda packaging as well. I'll try to package the newest version with conda and let you know.
When are you planning to remove dgl? It would be best to not have it for packaging (especially for the open source release since dgl is not on conda forge yet).
@hadim
When are you planning to remove dgl? Soon. Definitly before release.
Note that the IPU stuff requires Pip, poptorch
is still not available on conda.
A quick update here:
I will close this issue as it simply regroups many other issues.