AI4Finance-Foundation / FinGPT

FinGPT: Open-Source Financial Large Language Models! Revolutionize 🔥 We release the trained model on HuggingFace.
https://ai4finance.org
MIT License
13.72k stars 1.91k forks source link

Implement relative importing of data #162

Closed WilliamGazeley closed 6 months ago

WilliamGazeley commented 7 months ago

Currently benchmarking a model without using FinGPT as the working directory is difficult. I've updated the imports so that the benchmark functions will load data from files inside the FinGPT directory - this way benchmarks can be run from anywhere by simply importing fingpt as a library:

# pip install https://github.com/AI4Finance-Foundation/FinGPT.git
from fingpt.FinGPT_Benchmark.benchmarks.fpb import test_fpb
from transformers import AutoTokenizer, AutoModelForCausalLM

@dataclass
class Args:
    batch_size = 8
    max_length = 512
    instruct_template = "default"

model_name = "mistralai/Mistral-7B-v0.1"
model = AutoModelForCausalLM.from_pretrained(model_name).to('cuda')
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token

test_fpb(Args(), model, tokenizer)
llk010502 commented 7 months ago

Thanks for pointing out the issue. It is true that one should work on the local FinGPT directory for benchmarking based on our current repo content.

A easier way to import benchmarks modules without setting the directory could be:

  1. adding an "init.py" file to "FinGPT/fingpt/FinGPT_Benchmark/benchmarks", then you can import the module by downloading the package"pip install git+https://github.com/YOURGIT/FinGPT.git@YOURBRANCH".
  2. since "sentiment_templates.txt" file will not be downloaded with the package, then either download the file to local or build a function to load it from remote.
WilliamGazeley commented 7 months ago

Thanks for considering this PR, I'm trying to give something back to the community.

I've taken your advice and added an __init__.py file to "fingpt/FinGPT_Benchmark/benchmarks", and also updated fingpt/FinGPT_Benchmark/__init__.py to include the dataset downloading function.

For the sentiment_templates.txt file I have a cleaner solution; I've included a MANIFESTS.in file which causes the templates file to be included when downloading the package. I've tested the following and it works great:

pip install git+https://github.com/YOURGIT/FinGPT.git@YOURBRANCH
from fingpt.FinGPT_Benchmark import benchmarks, download_datasets
from transformers import AutoTokenizer, AutoModelForCausalLM
from dataclasses import dataclass

download_datasets()

@dataclass
class Args:
    batch_size = 8
    max_length = 512
    instruct_template = "default"

model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
model = AutoModelForCausalLM.from_pretrained(model_name).to('cuda')
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token

benchmarks.fpb.test_fpb(Args(), model, tokenizer)

In my testing, I found that the following libraries were not automatically installed with pip install git+https://github.com/YOURGIT/FinGPT.git@YOURBRANCH:

Should they be added to the requirements.txt?

llk010502 commented 7 months ago

Thanks for your update. The updated version seems good!

WilliamGazeley commented 6 months ago

Is this pull request still under review?