genaforvena / skiffs

Modular Language Model Architecture that makes Bobeobi Sing The Zangezi
3 stars 1 forks source link

Research Proposal: Exploring NELL Project with Smaller Models #2

Open genaforvena opened 8 months ago

genaforvena commented 8 months ago

Inspirations:

I am interested in exploring the NELL project with smaller models, specifically those with less than a billion parameters. I believe this could potentially improve efficiency without a significant trade-off in performance.

Here are some points I would like to investigate:

  1. Feasibility: Assess the feasibility of using smaller models in the NELL project. This includes understanding the computational resources required and the potential impact on performance.

  2. Implementation: If feasible, plan the implementation details. This includes selecting suitable models and determining how to integrate them into the existing project infrastructure.

  3. Evaluation: Define metrics to evaluate the success of this endeavor. This could include measures of efficiency (e.g., speed, resource usage) and effectiveness (e.g., model performance on relevant tasks).

  4. Comparison with Larger Models: Compare the results with those obtained using larger models. This will help us understand the trade-offs involved and guide future work.

genaforvena commented 8 months ago

https://arxiv.org/abs/1511.02799 this one might be alos interesting.

Ran just random midnight code, will see results tomorrow, but got wrong models (ones that generate continuation). Tomorrow need to find good instruct based models.

genaforvena commented 8 months ago

Have to revise the paragraph detection mechanism (it looks to be broken) and adjust inputs to ones that each model can receive. Ilya, get yourself together, the tokenizers should be tested. Don't waste your time

genaforvena commented 8 months ago

https://aclanthology.org/2021.findings-acl.254.pdf

genaforvena commented 8 months ago

https://arxiv.org/abs/2310.01267 paper "Cooperative Graph Neural Networks". worth reading, even though seemingly everyone concentrated on teaching models. I don't have compute for this, so idea that carefully tuned prompting could replace fine-tuning on some scope of tasks. That would mean basically no compute is needed. Million param models perform quite well on my old 8GB RAM machine.