Closed ZhiyuanChen closed 1 month ago
I made a draft (incomplete) entry, but I am not sure this is a "tool" as much as it is a training resource for applying machine learning to nucleotide and protein sequence data, (quite) analogous to ProteomicsML for mass spectrometry-based proteomics data. Perhaps this could be registered as training material in TeSS rather than as a tool in bio.tools? If so, the bio.tools entry can be removed.
The Apache Parquet format should be added to EDAM, if it hasn't already.
Thank you for your quick response!
We do provide many resources (models and datasets), but the core of MultiMolecule is to be a framework (a tool) for users who want to run machine learning models on their own data.
As it's still in its early stage, we now focus on pre-training. i.e., we provide pre-train dataset for those who design their own network, and we provide existing pre-trained models so they can compare their method with current SOTAs.
We are working on the fine-tuning part, in this stage, most people (who have a GPU) can fine-tune a model (from hugginface community or from the pre-trained model we provide) on their own data, with one command only.
We have done a lot of work to allow the framework recognise the user dataset automatically (so that users do not need to specifically prepare a data file).
We almost complete this part, and we are still waiting for user feedback for improvements.
I hope, in the next stage, we can provide pre-defined pipeline and fine-tuned models, so that every one can apply machine learning algorithms and inference on their own data (without the need of GPU), in one line of code.
I assume this can be considered solved.
Hi there,
Thank you for this wonderful registry.
Recently I have been developing MultiMolecule.
MultiMolecule is designed to be a deep learning toolkits for molecular biology.
Our goal is to make deep learning methods accessible to everyone in the community.
Currently, we have included many pre-trained deep learning models in RNA in our library, and the datasets used to train them. The pre-trained weights and datasets are also accessible in our 🤗 hub in a unified format for easier access.
We are working on adding the training scripts so that everyone can train / fine-tune their own machine learning models in a few clicks.