Improve default filters and make the whole process reproducible

bmcfee commented 2 years ago

Issue #75 raised some questions about the pre-packaged default filters that we ship with, and whether they could be improved. (I expect the answer is "yes".)

Previously, the filter optimization was implemented by a gaussian process hyperparameter optimization #8 as implemented in this gist: https://gist.github.com/4aa4c959bb0d310e3f12cdedf91d7661

The above notebook worked well enough given the constraints and tools of the time, but I did have to dredge it out of an old laptop. Properly this functionality should be included in the repository, and be fully reproducible (with rng seeds and all). Doing this will make it easier to improve the filters going forward. It would also make it possible to experiment with building a larger parameter search into the process.

If we reimplement this, it probably makes sense to discuss the window design objective (a little ad-hoc at the moment) and look into more modern tooling for GP search (eg hyperopt).

avalentino commented 2 years ago

Dear @bmcfee, I'm currently working on the packaging of resampy for debian (hope you don't mind). According to the debian policy, It would be important to have the possibility to re-generate the data file containing the filter(s) during the build process. For this reason having this issue closed would be the ideal solution.

Do you plan to implement is in the near future? Do you have already in mind a date for the v0.3.0 release? If this is the case I will wait the next release before submitting the upload request. Otherwise I will need to figure out some workaround that allows me to be compliant with the debian policy.

bmcfee commented 2 years ago

I'm currently working on the packaging of resampy for debian (hope you don't mind).

Not at all - thanks for putting in the effort!

Do you plan to implement is in the near future? Do you have already in mind a date for the v0.3.0 release? If this is the case I will wait the next release before submitting the upload request. Otherwise I will need to figure out some workaround that allows me to be compliant with the debian policy.

I think so, yes. I took a bit of time this afternoon to prototype a newer version of the parameter solver using optuna. (The previous version used https://github.com/craffel/simple_spearmint/ which was never properly packaged.) As it currently stands, it reliably produces filter parameters that are pretty close to what the old version did. I want to experiment with it a bit more to see if I can bring the noise down following the thread in #75, but I think this will be doable for the 0.3 release.

avalentino commented 2 years ago

Thanks for your quick reply. Unfortunately optuna is not available in debian currently, so I'm not sure that the new script would solve mi problem.

Would it be possible to have a copy of the data saved in txt format in the repo? Probably this could help.

bmcfee commented 2 years ago

Unfortunately optuna is not available in debian currently, so I'm not sure that the new script would solve mi problem.

Is that strictly necessary though? It wouldn't be a run-time dependency.

Would it be possible to have a copy of the data saved in txt format in the repo?

Is .npz not sufficient for this?

avalentino commented 2 years ago

Unfortunately optuna is not available in debian currently, so I'm not sure that the new script would solve mi problem.

Is that strictly necessary though? It wouldn't be a run-time dependency.

The idea is that the debian package should be re-build entirely form sources in a debian environment and without any access to the interned. Not having opuna in debian is blocking in this sense. Of course one could also create a debian package for optuna but this would require more effort.

Would it be possible to have a copy of the data saved in txt format in the repo?

Is .npz not sufficient for this?

I fear it is not. I will check again the policy and discuss with debian developers.

bmcfee commented 2 years ago

The idea is that the debian package should be re-build entirely form sources in a debian environment and without any access to the interned.

I think we still satisfy that requirement if the data is provided. The packaged filter coefficients are just a cache of something you could compute directly with an explicit parametrization. While I agree that it would be great in principle to have this all end-to-end, it seems way overkill IMO. They wouldn't require this for something like icons or audio excerpts, right? What makes this any different?

Is .npz not sufficient for this?

I fear it is not. I will check again the policy and discuss with debian developers.

That also seems weird to me. It's an open format, and generally preferably to a text-based encoding (which may be lossy via float<->decimal conversion).

avalentino commented 2 years ago

The idea is that the debian package should be re-build entirely form sources in a debian environment and without any access to the interned.

I think we still satisfy that requirement if the data is provided. The packaged filter coefficients are just a cache of something you could compute directly with an explicit parametrization. While I agree that it would be great in principle to have this all end-to-end, it seems way overkill IMO. They wouldn't require this for something like icons or audio excerpts, right? What makes this any different?

Sorry, just for me to understand, Is it something that I can compute using the resampy.filters.sinc_window function? If so, probably it is just a matter of documenting the parameters somewhere e.g. a dedicated README in the data folder.

bmcfee commented 2 years ago

Sorry, just for me to understand, Is it something that I can compute using the resampy.filters.sinc_window function?

Basically yes. "kaiser_best" and "kaiser_fast" are cached versions of filters constructed by sinc_window. The concern of this issue is the code which selects which parametrization (beta, rolloff, maybe other parameters) should be cached, and this only needs to happen once (in 2016 :grimacing:). There is no runtime dependency, or even build-time dependency on this parameter optimization whatsoever.

avalentino commented 2 years ago

Sorry, just for me to understand, Is it something that I can compute using the resampy.filters.sinc_window function?

Basically yes. "kaiser_best" and "kaiser_fast" are cached versions of filters constructed by sinc_window. The concern of this issue is the code which selects which parametrization (beta, rolloff, maybe other parameters) should be cached, and this only needs to happen once (in 2016 grimacing). There is no runtime dependency, or even build-time dependency on this parameter optimization whatsoever.

OK, do you have parameters used to generate "kaiser_best" and "kaiser_fast" in resampy v0.2.2? If my understanding is correct the parameters documented in #98 are the new ones, correct?

Having the parameters would completely solve my problem with the debian packaging, because I can generate the binary files during the build process with a very simple script.

avalentino commented 2 years ago

OK, do you have parameters used to generate "kaiser_best" and "kaiser_fast" in resampy v0.2.2?

Sorry I have just realized that the parameters are already documented in the (current) docstring. Probably only the precision is missing, but I can retrieve it anyway.

bmcfee commented 2 years ago

Probably only the precision is missing, but I can retrieve it anyway.

Yeah, sorry for that - the open pr #98 documents this more fully. The precision values are stored in the data files though, so all the information is there.

avalentino commented 2 years ago

Thanks a lot @bmcfee The package is now ready. I should hopefully go into the main archive in a a couple of weeks

bmcfee commented 2 years ago

Very cool - thanks!

I'll also plan to have the 0.3.0 release done up soon, and the upgrade process should be pretty easy.

avalentino commented 2 years ago

yes, after the first upload in the debian archive I should be able to perform the update to new versions very quickly

bmcfee / resampy

Improve default filters and make the whole process reproducible #96