Open mpariente opened 3 years ago
Good idea. I actually prefer the reference file solution because it is the most flexible (eg. you decide when to update the reference) and is probably much much easier from an environment/dependency point of view... I don't fancy maintaining installation processes of ancient versions of Asteroid and its dependencies :)
Maybe we can simply add a reference to each model export automatically? So the exported model ships an example along with the weights. (Maybe even add a README etc...) Or we can have the references in a separate folder/repo in the Asteroid codebase.
How Huggingface is doing it: https://github.com/huggingface/transformers/blob/master/tests/test_modeling_gpt2.py#L480 Looks like they do it using a reference shipped with the code base, so not tied to a pretrained model per se. I wonder how they do it for user-uploaded models though...
Maybe we can simply add a reference to each model export automatically? So the exported model ships an example along with the weights. (Maybe even add a README etc...) Or we can have the references in a separate folder/repo in the Asteroid codebase.
I thought about shipping examples with model sharing. We can't upload examples that come from wsj0 though. So random examples probably. All the Zenodo records can have a v2 and we could add a mix/estimates folder. That should weight too much.
How Huggingface is doing it: https://github.com/huggingface/transformers/blob/master/tests/test_modeling_gpt2.py#L480 Looks like they do it using a reference shipped with the code base, so not tied to a pretrained model per se. I wonder how they do it for user-uploaded models though...
It was on my todo-list to check how they do. They can afford writing input/output by hand, but for us, it's not feasible.
That should weight too much
Typo?
Didn’t think about the license issue, good point :( Otherwise I’d have suggested to simply extract an example automatically during evaluation.
They can afford writing input/output by hand, but for us, it's not feasible.
Why not? We can’t put the entire input/output in the tests but some proxy (a few samples only, SI-SDR improvement value, ...). We can even autogenerate the test code and print to console so that people just have to copy and paste to a unit test. (I still think shipping examples with the models, and not in the code base, is more flexible and powerful.)
Typo?
Yes. I meant "It shouldn't weight too much".
Seems cumbersome, let's ship examples in the models' tarballs.
From the LDC User agreement.
User and User’s Research Group may include limited excerpts from the LDC Databases in articles, reports and other documents describing the results of User’s non-commercial linguistic education, research and technology development.
If we always select the same sample, we can share a real one.
So we agreed to add the test files in each record. I'm now wondering where in the record.
Should it be (1) saved in a different file (that we would then have to download separately), or (2) saved as a tensor in the model file itself? Adding it in the model file itself is the most practical for now:
test
key with mixture
and estimates
.OTHO, if we save in different files, we either have to upload several files, or zip/tar the whole and upload it (change the upload/download code either way). The good thing about that approach is that anybody can listen to the input/output examples (but requires to use meaningful examples). And we'll be able to use this approach to upload convergence plots or other things in the future.
In ESPNet's model zoo, they upload a zip that contains several things, which is quite cool. This is what it looks like:
Last important thing to have in mind: when the files are modified, a new version of the record is made. So, if we decide that (2) is not good enough later in time and decide to change it, records will still have a version as (2) that will work with the corresponding asteroid version. All of this to say, backward compat is not a real problem here.
While we're at it, I can also add the sample_rate.
import torch
from pathlib import Path
from asteroid import available_models
from asteroid.models.base_models import BaseEncoderMaskerDecoder
from asteroid.utils.hub_utils import cached_download, SR_HASHTABLE
for m_path in available_models():
# Load dic and model
f = cached_download(m_path)
dic = torch.load(f)
dic["model_args"]["sample_rate"] = SR_HASHTABLE.get(m_path)
model = BaseEncoderMaskerDecoder.from_pretrained(dic)
# Predict on random mix
u = torch.randn(1, 8000)
v = model(u)
# Add new tests field
dic["tests"] = dict(mixture=u, estimates=v)
save_path = Path(f).parent.parent / '_'.join(m_path.split('/')) / "model.pth"
save_path.parent.mkdir(parents=True, exist_ok=True)
torch.save(dic, save_path)
This works. I'll "just" have to ask the uploaders to update their records and we'd be mostly done. However, we can't listen to the separated results, and the mixture/estimates are senseless. How do you feel about that @jonashaag @mhu-coder @JorisCos?
I don't think I'll take time to do (1) in the very near future and I feel like going forward without compatibility test is quite dangerous..
Actually (1) and (2) are not necessary incompatible. We could put senseless I/O data in the model for the sole purpose of testing backward compatibility and keep the meaningful data (audio, plot, etc) in a separate files. The whole thing could then be tarred/zipped.
I wonder though, is a single data point sufficient to check backward compatibility? I mean, do we want to make sure that the shape at the output of a model, for input data of different shapes, will not change even when the code for said model changes? I am asking because, while working on #293, I realized that we broke this aspect when we added jit compatibility. If we test for various input shapes, dic["tests"]
could be a dictionary of mixture/estimate pairs.
(Oops.)
Maybe naive question: Why are we using Zenodo in the first place? I guess as soon as we start having revisions of models and multiple files etc. we will start implementing a tiny version control system... so couldn't we just use Git to track the pretrained models? Then we could also add as many examples and test files and scripts etc. as we want. (This is obviously inspired by what Huggingface is doing :))
Maybe naive question: Why are we using Zenodo in the first place? I guess as soon as we start having revisions of models and multiple files etc. we will start implementing a tiny version control system...
Good question :thinking: ... If we end up version controlling models, we could have a go at git-lfs.
That said Zenodo has the advantage of having a browser-based GUI while git-lfs requires users to install a client on their machine if they want to access the data. Plus, there's a limit of 1GB of free storage on github.
Why are we using Zenodo in the first place?
Because I didn't find a better alternative for free. It has a versioning system, each model can be cited, models can be browsed. It was the closest I could get to huggingface's model hub (which is great) with my limited time and resources. But if there is a better way, I'd be happy to switch.
I guess as soon as we start having revisions of models and multiple files etc. we will start implementing a tiny version control system... so couldn't we just use Git to track the pretrained models?
I'm not sure I understand what you mean, Zenodo comes with versioning. If I update the files in the record, I just need to update the URL in URL_HASHTABLE
and that's it. (But the old URL still points to the old model)
Hmmm... probably Zenodo is good enough. Using Git might be overkill. You don’t update the model every day, probably only every few months at most.
I think uploading zip files to Zenodo is the best option, even if it’s a bit more involved to implement. I can invest some time into this in the next days to get a better understanding of what needs to be done, if you’d like me to :) Dealing with zip files in Python is pretty easy but we'd have to adapt all of the code for downloading, uploading and testing.
I don't see the limitation of Zenodo for our purpose. Yes, live demo would be AMAZING, but that's out of reach for now.
Also, updating models is confusing. I've seen this in challenges when the baseline model is updated without notice and suddenly people are very confused etc.. I'd rather have an explicit v2 in the name if the weight are changed, until we have clear model comparison tools.
Please, do invest time if you feel like it :-) In situations like these, I really want to get rid of the recipes.. The code for downloading/uploading is fine but updating the recipes is boring.
Let's agree, what should we put in the .zip?
Please, do invest time if you feel like it
Of course I didn’t get around to doing it 🙄 probably also not this week.
🚀 Feature
Include version compat tests for pretrained models (models from asteroid 0.3.0 still work as expected in 0.3.3).
Motivation
We can break things between versions and still pass CI, this is not ok. For example #255 (solved by #258 )
Questions & remarks