why not reuse fschat code?

DachengLi1 / LongChat

Official repository for LongChat and LongEval

Apache License 2.0

504 stars 29 forks source link

why not reuse fschat code? #16

Closed lucasjinreal closed 1 year ago

lucasjinreal commented 1 year ago

Hi, just wonder why not directly import fschat for some common codes, is there any important modification on this for example conversation?

DachengLi1 commented 1 year ago

@lucasjinreal Hi, yes, we will refactor the code to use more fschat one. This is because we first wrote the LongChat codes and then integrate them into FastChat!

lucasjinreal commented 1 year ago

@DachengLi1 May i ask when will we see longchat avaiable in fschat?

DachengLi1 commented 1 year ago

@lucasjinreal LongChat is already available in FastChat :P.

lucasjinreal commented 1 year ago

@DachengLi1 Does training also supported now?

DachengLi1 commented 1 year ago

@lucasjinreal not yet. Training code/script is in this repo. But honestly there is not much different than the train_mem.py in FastChat, except you will need a monkey patch. But that is already supported in FastChat load_model. We currently don’t have plan to specifically support training in FastChat, but if you really want to, adapting the script in this repo to FastChat should take very minimum time(like 30minutes).

lucasjinreal commented 1 year ago

@DachengLi1 from i saw it just need make some modification on preprocess data? I have some original work modified from fschat train_mem, can I take a minimal edit migrate it into longchat compatible training?

DachengLi1 commented 1 year ago

@lucasjinreal i don’t think I have even modified the data processing. I think (1) make sure load_model applies the condensing monkey patch, (2) use flash attention.

lucasjinreal commented 1 year ago

@DachengLi1 Does flash attn is necessary? on v100 seems not support flashattn