arcee-ai / mergekit

Tools for merging pretrained large language models.
GNU Lesser General Public License v3.0
4.9k stars 448 forks source link

MoE model get worse result after finetuning #217

Open oymzysmwe224 opened 8 months ago

oymzysmwe224 commented 8 months ago

I attempted to merge 4 Yi-34B models using the MoE branch of merge-kit (with each token activating 2 experts). These four models are as follows, all of which are based on the Yi34B-base and trained with different sft data, and they rank high on the OpenLLM leaderboard.

When I directly tested the merged model (called MoE-v1), it performed better than Yi34B_sftv5_base_epoch2 on most benchmarks. However, when I tried to continue finetuning this MoE model to make it even more powerful, I got results that were completely counterintuitive. I have tried using data from sharegpt, as well as finetuning with internal in-house data. The experimental setup was a learning rate of 1e-5, training for 2 epochs, and I tried toggling the auxiliary loss for load balancing.

But all the models I got were not as good as the original MoE-v1, and the more training data I used, the worse the performance became. There was a decline in almost all benchmarks, with only some improvement in the humaneval benchmark. In subjective testing, I found:

Below are some examples. I noticed a document mentioning that it is best to merge models with similar capabilities. I'm not sure if my problem is due to the significant differences between these models, which might have caused the multiple experts to interfere with each other during further finetuning. https://docs.google.com/document/d/1_vOftBnrk9NRk5h10UqrfJ5CDih9KBKL61yvrZtVWPE/edit?pli=1 img_v3_029d_55117619-4d17-4b2a-86d1-3f7ee9abb1cg image

I don't know if you have had similar experiences or problems, and I hope to get help and guidance. The yaml file I used for merging is as follows. I select several training samples from the trainset of each model.

base_model: /mnt/bn/vgraph/biz_llm/biz_llm/arnold_output/Yi34B_sftv5_base/Yi34B_sftv5_base_epoch2
gate_mode: hidden # one of "hidden", "cheap_embed", or "random"
dtype: bfloat16 # output dtype (float32, float16, or bfloat16)
## (optional)
experts_per_token: 2
experts:
  - source_model: /mnt/bn/vgraph/biz_llm/biz_llm/arnold_output/Yi34B_sftv5_base/Yi34B_sftv5_base_epoch2
    positive_prompts:
      - '请你提取口播文本的营销卖点。营销卖点主要包括以下几种类型:[产品卖点、用户痛点、适用场景、适用人群、优惠活动]。最终卖点结果要求如下:1)回答结果以dict格式给出,即结果必须是字典,比如:{"卖点类型1":["卖点1"], "卖点类型2":["卖点1","卖点2"]},不要输出其他任何解释性语句。2)对于总结出的卖点结果,要求简单、清晰、精炼、通顺。3)所有卖点之间都要没有语义重复;4)总结卖点结果时,对于成份/材料的罗列,或者多个场景/人群的适用总结成一个最终卖点。但对于产品核心特点和感受不需要总结。对于用户痛点,要将其转换成正向表达,比如广告文本中的产品解决了用户担心糖分高的痛点,则卖点总结成“健康低糖”。5)总结的卖点值一定不要有重复的文本,比如多次出现同一个词,这是绝对要避免的,适用人群和场景只能有一个结果。不要去猜测广告文本中没有体现的功能功效。如果口播文本中没有卖点信息,则输出"无卖点"。\n口播文本:还添加了绞股蓝,科学配比,协同作用,促进体内血红蛋白的合成。。回答:'
      - '你是一个专业的商业化营销人员,你能够结合你的知识完成各种有挑战的任务。\n\n请根据下面的商品信息,生成推销话术,用于电商直播间主播的口播文案\n1. 发散出合适的常识和场外信息,保证话术中有超出商品信息本身的新增信息量\n2. 根据商品信息、常识、场外信息,编写合适的推销公式(文案大纲)\n3. 根据商品信息、常识、场外信息、推销公式,生成一段推销话术\n4. 根据推销公式和推销话术,将话术中的句子按照公式子模块分类\n\n商品信息:\n{\n    "商品名": "抓夹",\n    "商品卖点": "爱心拼色;亚克力树脂材质;两层防滑齿梳",\n    "优惠信息": "",\n    "参与的直播间玩法": "",\n    "参与的营销节点": ""\n}\n'
      - '请用340字的广告文案,介绍下面这款商品的特点:\n\n商品名称:【七夕礼物】Colorkey珂拉琪水雾唇露唇釉显白持久口红不易沾杯#。\n商品信息:{"产品卖点": {"通用卖点": "低饱和度颜色,好看;有300、307、308色号可选;308涂上后显得肤色和牙齿白,显得有女人味;307色号是淡淡磨砂粉,涂上显嫩,可以打造素颜、淡妆、运动随性的妆容;300色号柔美,有让人想要保护的感觉;不沾杯,持妆效果好", "福利优惠": ""}}。\n该条文案具体的生成要求如下:\n商品行业类型:电商\n一级行业名称:美妆\n营销节点:""\n语言风格:使用抖音短视频风格,要求口语化\n   请你为字数较多的完整商品名,在文案中进行合理地简略,使其口语化。'
      - "You are an assistant designed to extract product attribute from text in the e-commerce scenario. Each attribute result should not exceed 5 words, and the result should be in English. Return the result in markdown table format, and the attribute should only belong to ['product type','selling point','pain point','styles/design','pattern','suitable crowd','applicable occasion','function','material/ingredient','size','color','scent']. If a certain attribute cannot be extracted from the given text, then ignore the key value in the result dictionary. Please extract the attribute from the following text:\n```\nMaterial: Oxford cloth. Main Component Of Fabric: 100 % polyester. Main Component Content Of Fabric: 600. Fabric Sub-component: 600 d oxford cloth. Style: Modern simple. Origin: Yiwu. Outdoor Fabric: 600 d cattle jue cloth. Color: Rainbow stripe. Size: 148*100. Whether Cross-border Export Of Special Supply Sources: No. Specifications (l*w): 148*100,148*200,200*200. Weight: 1000 gram. Types Of Moisture-proof Pads: Outdoor. Category: Picnic pad / moisture pad. Outdoors Waterproof Fold Sandy Beach, Blue Cherry, 148*200\n```\n"
      - "你需要结合下述指南和输入的`商品信息`生成小红书风格的短文案\n\n## 规则\nA、先仔细理解输入的`商品信息`,并理解其特点和优势,然后生成小红书风格的爆款文案。\nB、文案要语义通顺完整,要有连贯性,要有逻辑,内容上要贴合输入商品。\nC、合理使用下面的**爆款关键词**和emoji表情,使文案更有煽动性和丰富有趣:\n    - 永远可以相信, 被夸爆, 手残党必备, 超超超, 决绝子, 怒推一波, 天花板级别, 大数据, 集美们, 搞钱必看, 神仙, 沉浸式, 天花板, 敲击喜欢, YYDS, 手把手, 宝藏, 建议收藏, 吐血整理, 推荐, 隐藏, 美哭了, 家人们, 教科书般, 拿捏, 挑战全网, 治愈系, 神仙单品, 行走的种草机, 终于于于于于于, 泰裤辣, 正确姿势, 提升幸福感, 好用到哭, 日常小幸福, 高级感, 变美秘籍, 绝绝子, 爱了爱了, 小众, 狠狠搞钱, 停止摆烂, 巨巨巨, 谁用谁知道, 无痛当妈, 压箱底, 原地封神, 人间值得, 变美神器, 氛围感, 宝藏好物, 爆款, 闭眼入, 安利, 敲击实用, 破防了, 种草/拔草;, 干货, 秘方, 都给我冲,划重点, 怒赞, 普通女生, 我不允许, 爱不释手, 必备, 高颜值好物, 好用哭了, 太可了, 绝绝子神器, 逆袭, 笑不活了, 上天在提醒你, 封神, 氛围感拉满, 谁懂啊, 治愈, 好看到尖叫, 万万没想到, 打工人, 揭秘, 逆袭必备, 惊艳, 高颜值, 暴风吸入, 小众宝藏, 无限回购, 一口封神, 安利达人, 感谢网友, 真的管用, 拿捏了, 安利一波, 有手就能做吹爆, 小众高级感, 干货满满, 小白必看, 怒推, 美到哭, 心动瞬间\nD、每段短文案的字符数不低于5字,不超过13字。\nE、输出格式为json字典格式,多段短文案放在一个列表中,即:{'文案': ['文案1', '文案2',...]},无需任何解释。\n\n## 现在我们正式开始,`商品信息`如下,请输出6个小红书风格的文案:\n```\n商品信息:顶爆内爆膨胀螺丝平爆壁虎内膨胀螺栓M12整箱固定水钻支架专用。。商品:['膨胀螺丝'];\n```\n注意,务必输出6个短文案,且每段短文案的字符数不低于5字,不超过13字!回答:"
      - '应用你的专业知识,我希望你能分析话术中使用的推销方式或推销步骤,识别其中使用的每一个步骤,将话术拆解成可执行的步骤信息,输出最终的步骤流程及适用范围,并将原文中的每一句话对应到步骤流程中,json格式如下:\n    {"公式": "步骤名1 + 步骤名2 + 步骤名3 + ...","适用范围": "","子模块定义和原文对应关系": {"步骤名1": {"定义":"步骤1含义解释","原文语句":"步骤1对应的原文语句"}, "步骤名2": {"定义":"步骤2含义解释","原文语句":"步骤2对应的原文语句"}, "步骤名3": {"定义":"步骤3含义解释","原文语句":"步骤3对应的原文语句"}, ... }}\n    注意步骤名要简明扼要的表达‘含义解释’,不要包含实际的商品和卖点信息。含义解释也不要包含实际的商品和卖点信息,应该只是解释步骤名的具体含义。原文语句一定是原文中出现过的,返回的语句在原文中是连续的,不要分开截断。不要分析过程,直接返回最终结果。\n输入:\n需要分析的“话术”是:\n\n屁股不会露不会透!全部做到100-103公分,小个子145穿的高个17580斤穿到160斤,在这个身高体重内都能拍!最后4单送上衣的名额不再送,抓紧时间11:55准时接单!下不下这项链接?没有福利不加单!1号链接记住了吗?要回来跟我说一声,全部安排好来啊!抓紧时间,所有女生听清楚,还在纠结的还在犹豫了?那真的错过了,加不了单啊,来返不了场,来拍了是吧?好的,谢谢我春。要链接!还有哪些女生加入购物车!不要再加入购物车!来!最后3单听清楚,不再送了!送不了这个上衣了!所有宝贝听清楚,抓紧时间回去试,回去穿!好看满意咱们再留下!不好看不满意可以退可以换啊!看清楚吗?有运费险,有7天无理由!对准尺码表拍标准尺码,标准拍卡边。 \n    \n'
      - '请简要介绍一下什么是ByteStore?'
      - 'JOD/USD是什么?能简单介绍一下吗。'
      -  '你擅长分析时间类问题, 能够从一句话中找到句子中涉及的时间段, 如果句子中未提及时间则输出 `无,无`\n你的任务是根据`当前时间`信息,和下面`文本`中的内容提取出其中蕴含的时间段信息,输出的时间范围格式要求如下:\n```\nyyyy-MM-dd,yyyy-MM-dd\n```\n示例输出:\n```\n2000-01-01,2000-01-15\n```\n\n## 现在请帮我完成下面的任务:\n已知`当前时间`是 2023-09-06, 需要提取的`文本`如下:\n```\n近一周在产品一级分类和广告类型维度下,搜索广告和通投广告的APP内访问详情页和计费六日内付费金额是多少?\n```\n请输出其中蕴含的时间段信息\n【注意】\n你可以一步一步的思考起止时间, 结果仅需要输出 ```yyyy-MM-dd,yyyy-MM-dd``` 即可, 无需输出思考过程\n回答:\n'
      - '请你阅读下方的用户对话记录,根据历史对话,用一句话总结用户最终想问的问题,请直接返回结果,不要做额外的解释。\n\n历史对话记录如下:\n```\nUSER:本月抖音号ID为5877且广告类型为搜索广告的直接下单金额\nASSISTANT:结果如下\nUSER:分下单平台的呢\n```'
      - '参考下面的文本,改写成行业相关的文案\n玛莎拉蒂是法拉利的副牌。维护成本要比主牌低,而且绝大部分部件还是从法拉利那边传承下来的,所以性能也不会差太多。'
      - '根据以下广告文案改写一个吸引人的标题:"伊利柠檬味酸奶,清爽的口感让你一夏畅快无比!"\n伊利柠檬味酸奶,清爽的口感让你一夏畅快无比!想象一下,在炎炎夏日里,桌上放着一杯冰镇的柠檬味酸奶,让你感受到清爽无比的口感,排解你一整天的疲惫。\n'
      - '写一篇满足以下条件的营销文章,内容为某家自行车品牌:\n本品牌是一家以研发、生产和销售自行车为主的公司,拥有强大的设计和研发团队,生产各种样式的自行车,包括山地自行车,公路自行车,普通自行车等等。品牌一直致力于通过提高品质、持续创新和客户至上的服务来提升用户的骑行体验。品牌自诞生以来,一直保持着领先的地位,并得到了广大用户的信赖和支持。'
      - '你是谁?'
      - '谁开发了你?'

    ## (optional)
    # negative_prompts:
    #   - "This is a prompt expert_model_1 should not be used for"
  - source_model: /mnt/bn/vgraph/llm_common/hf_models/Yi34B_for_MOE/SUS-Chat-34B
    positive_prompts:
      - '1200 / 45的精确值是多少?'
      - '张伟计划在暑假的60天内,每天阅读30页的书。如果张伟在假期的前15天每天阅读40页的书,那么他在剩下的假期天数里,每天应阅读多少页的书才能达到他的阅读计划?'
      - '请根据这篇小说的剧情,生成一段悬疑推理文案,并挖掘出主人公解决问题的关键因素。\n小说名:《迷雾山庄》\n简介:发生在1900年代英国某庄园里的一宗离奇命案,老管家突然死亡,而庄园主人也在事件中失踪。'
      - "請你接下來都表現得像高傲嬌貴還有點蠻橫的中國古代貴族千金"
      - "父亲和母亲能结婚吗?"
      - "如何与其他人分享我们的chatGPT聊天记录?"
      - "母弱出商贾,父强做侍郎,族望留原籍,家贫走他乡 这几句话啥意思?"
      - '请帮我完成信息抽取的任务,要求组织成表格的形式输出,需要做抽取的对话本文如下:\n客服:很高兴为您服务,您好。 \n用户:唉,你好,我想查询一下我这附近有没有5G信号覆盖。 \n客服:嗯,您要查询5G信号的话,可以登录手机营业厅,里边有一个服务查询附近5G。可以在这儿查看。 \n用户:然后就是我怎么看到我。我我你听我说,我查看的时候,我这是在那个边缘附近,我没去,我也确定不了我这有没有覆盖。 \n客服:嗯,那这边的话是只能通过手机营业厅自助查看的,咱们这边目前查不到。 \n用户:嗯。是吗?那那没事了,挂了吧, \n客服:嗯,好勒,那祝您生活愉快,再见。'
      - '杰克有两张20美元的钞票和一张10美元的钞票。他买了一个玩具车花了25美元。他还剩下多少钱?'
      - "一个长方形的长是10厘米,宽是4厘米。长方形的周长是多少?"
      - "有哪些避免使用SebeiSch的方法?"
      - "你会建议在4月22日至24日去瑞士塞尔马特需要穿什么类型的外套?请具体说明。"
      - "帮我出一份调研报告的框架大纲,调研报告题目是《关于隆阳区属国有企业品牌水“大玺山”营销的市场环境分析及策略的调研》,该品牌水现有产品有瓶装水、桶装水。目前销售受众群体为保山市隆阳区各小初中学校及部分机关单位。"
      - '목포의 특산품 시발낙지에 대해 설명해줘. 이름이 욕에서 유래했다는데 그 이유가 궁금하고, 맛, 어디에서 주로 잡히는지도 알려줘.'
      - '爲何有些酒店的牀是圓型的?尤其是情侶套房?'
      - '以下是一段两人对话,请选择正确的三句话填在空格处,答案顺序不能出错。正确的选项是:\n\n你看最近特别火的那部韩剧的最新一集了吗?\n我没有。那是关于什么的?\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n这部剧有啥亮点?\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\\_\n优酷上就有。\n好的,我去看看。\n\nA、这是一部关于一名侦探试图解决小镇上的一系列谋杀案的悬疑系列剧,非常精彩。\nB、在哪儿可以看啊。\nC、它的情节设定非常吸引人,演员演得也很好。\nD、今天天气不错啊。\nE、你看这部电影了吗?\n'
      - '一支牙膏是3元7角,一个牙刷是2元2角,他们一共多少钱?牙膏比牙刷贵多少钱?'

  - source_model: /mnt/bn/vgraph/llm_common/hf_models/Yi34B_for_MOE/Nous-Hermes-2-Yi-34B
    positive_prompts:
      - 'What are the physical laws and formulas relatedd to gravity?'
      - 'Create a Flask based FTP Server'
      - "So there he was, sitting on the queen's bed, wearing a soiled T-shirt and jeans, holding a broken ashtray and bleeding from a cut on his hand:. A man named Michael Fagan, who was 31 at the time, had climbed a wall of Buckingham Palace, crawled through an open window and made his way to the bedroom of Queen Elizabeth II, who was sleeping. When she awakened to find the guy sitting there staring at her, it is fair to surmise that she was not pleased. Especially when Fagan asked her if she had a spare cigarette. This happened on July 9, 1982, generating international headlines, and only because the queen survived unharmed does it read like a comedy of errors. The errors were plentiful, including the queen's repeated attempts to summon help, at first to no avail. But now, 30 years later, during the celebration of the Queen's Diamond Jubilee in London this week, it is instructive to recall the monarch's rude awakening by the bleeding intruder who was in search of a smoke. We assume that the handful of people at the absolute pinnacle of life -- the queen being the lead example -- have the means and the staffs and the security teams to insulate themselves from indignities that mere mortals have to put up with. Analysis: Why Queen Elizabeth's jubilee celebrations matter to Brits. And, most of the time, they do. But when there are breaches of presumed privacy, boy, does it cause a commotion. Follow CNN's live jubilee blog. We don't have to go back in history to look for instances of this. Just consider what Pope Benedict XVI is thinking right now in the wake of the news that his butler, Paolo Gabriele, has been arrested, accused of having the pope's confidential documents in his home. The investigation is still playing out in Rome, but if the allegations prove true and it turns out that Gabriele, one of the few people who had access to the pope's living quarters, including the pope's desk, lifted the information, then the pope will know with certainty that even with his rarified position, he can't count on personal privacy behind the guarded walls of the Vatican. In the United States, President Ronald Reagan, after an assassination attempt on a Washington sidewalk in the second month of his first term, was assured that efforts would be redoubled to make certain no one who wished to hurt him would be able to get that close again. Nancy Reagan was said to be especially adamant that her husband be kept out of harm's way. So it was almost beyond belief one afternoon in April 1992 when Reagan, out of office but still being protected by the Secret Service, was in Las Vegas to receive an award and was accosted right onstage. Reagan had just been given the award -- a 2-foot-high crystal statue of an eagle -- when a man named Richard Paul Springer, 41, walked through the ring of security around the former president and onto the stage, grabbed the crystal eagle, smashed it forcefully to the floor (with Reagan being struck by the flying glass), and commandeered the microphone. Reagan was hustled offstage by the Secret Service, and Springer, too, was hauled away. But there was widespread incredulity that someone could get past all the federal, local and private security that is always on hand for an appearance by a president or former president. Some breaches are more comical than frightening. Elvis Presley was famously kept away from the gazes and grasping hands of his admirers when he was not performing. His privacy at home was especially important to him. But his old friends and employees sometimes tell the story of what happened when a crate arrived at his home with ventilation holes punched into it. The delivery service said that fans had sent him, as a gift, a top-pedigree dog. So, according to the story, the crate was carted into the house, and was opened. Out climbed two young women. They were escorted off the property. The persistence of those who infringe upon the privacy of the exalted can be astonishing. It turned out that Michael Fagan, the queen's uninvited visitor, had managed to sneak into Buckingham Palace a month earlier and had helped himself to a bottle of wine. For that he was charged with theft -- but was not charged criminally for the subsequent trespass that took him to her sleeping quarters, because at the time it was considered a civil violation. And Squeaky Fromme, sent to prison for a failed assassination attempt on President Gerald Ford in 1975, wrote to him while she was incarcerated -- a letter described as `strange` by Ford. She was able to reach the president once he had left office -- the letter made it to his home, and he read it -- even while she was locked away from society. The oddity of these encounters that happen when the personal space of the most protected people on the planet is violated -- the bizarreness of the moments when they are reminded that nothing in life is guaranteed -- can be both mesmerizing and haunting. Just ask Queen Elizabeth, in the unlikely event you ever get close enough. But you probably shouldn't ask her for a cigarette. \n\nHere is a summary of the highlights for this article:"
      - 'Generate a descriptive sentence about a restaurant using the following words:\n\nname = The Rice Boat, eatType = restaurant, food = Italian, area = riverside, familyFriendly = yes\n'
      - 'Q:Title: Shallow, bland and unscholarly Review: Why do people write books when they have nothing profound or original to say, and nothing in the way of serious scholarship to share? This book content even falls short of effective popularization. Sadly, `works` like this reinforce the case for relaxing academic pressure to publish. Is the review positive or negative?\nA:'
      - 'I have a JNI function in my Java program that needs to build and return a HashMap. The key for the map is a String, and the respective value is either a boolean or a Boolean object. However, when I try to access the value in Java, it always shows up as null. Here is the code I currently have:\n\n```cpp\njclass mapclass = env->FindClass(```'  
      - 'Are these paraphrases?\nThey come close to succeeding , with unintentional help from Jim and Tim Possible , but Kim is saved by Ron at the last minute .\nThey come close to being successful , with unintentional help from Kim and Tim possible , but Jim is saved at the last minute by Ron .'
      - 'Give the step-by-step reasoning process and then the final answer. Bronson decides to collect Oak Leaves from around his Neighborhood. He collects 12 on Thursday and 13 on Friday. 20% are Brown and 20% are Green. The rest are yellow. How many yellow leaves does he collect?\n'
      - 'I have a code snippet which is throwing a runtime exception. Can you explain and correct the issue in the code? Additionally, I would like to understand the concept of exception handling in the Java programming language, including the differentiation between checked and unchecked exceptions, and how to handle them using try-catch blocks and the throws keyword.\n\n```java\npublic class Main {\n\n    public static void main(String[] args) {\n        try{\n            System.out.println(10 / 0);\n        }\n        catch(ArithmeticException ex){\n            System.out.println(ex);\n        }\n    }\n}\n```'
      - 'Mitchell is trying to chew as many pieces of gum at once as he can. He has 8 packets of gum, There are 7 pieces in each. If he chews all the gum except for 2 pieces, how many pieces does he chew at once?\nThoughts? Step-by-step reasoning:'
      - 'What is the best way to create a CSS stylesheet to style a web page with fonts, colors, and basic layout?\n'
      - 'What surface modification technique can be utilized to enhance the biocompatibility of titanium implants for bone tissue engineering applications?\n'
      - 'Design a custom validation rule for a Symfony console command that checks if a given Elasticsearch index exists and contains at least one document. If the index exists and has no documents, the validation rule should throw an error message indicating that the index is empty. If the index does not exist, the validation rule should prompt the user to create the index using a specific template that includes fields such as "id", "name", "age", "gender", "city", "state", "country", "latitude", and "longitude". Additionally, the template should include a mapping for the "created_at", "updated_at", and "last_active_at" fields, which should have the "type": "date" and "format": "yyyy-MM-dd HH:mm:ss" properties. The validation rule should also require the presence of unique indexes for the "id" and "name" fields.\n'
      - 'Triple: The Eagle eatType coffee shop; The Eagle food Italian; The Eagle customer rating low; The Eagle area riverside; The Eagle familyFriendly no; The Eagle near Burger King\n\nWhat is a sentence that describes this triple?\n'
      - "Find the right ending to this passage.\n\nBy Daily Mail Reporter A schoolgirl who suffers from a potentially deadly heart condition believes her beloved pet cat is responsible for saving her life. Maria Gillon, 13, suffers dozens of crippling chest pain attacks which stop her moving or talking due to ventricular tachycardia, which more than doubles her heart rate. She is often at risk of her heart stopping and is particularly vulnerable while sleeping in her room alone at night. Scroll down for video Maria Gillon, 13, from Gorebridge, Midlothian suffers potentially fatal heart seizures. She says she owes her life to her lucky black cat Perla who raises the alarm\n\n'While\n\nOPTIONS:\n- Daily Mail was staying with us she became a real favourite with the staff, despite her injuries she remained bright and won people over with her personality.\n- Gorebridge was staying with us she became a real favourite with the staff, despite her injuries she remained bright and won people over with her personality.\n- Maria Gillon was staying with us she became a real favourite with the staff, despite her injuries she remained bright and won people over with her personality.\n- Midlothian was staying with us she became a real favourite with the staff, despite her injuries she remained bright and won people over with her personality.\n- Perla was staying with us she became a real favourite with the staff, despite her injuries she remained bright and won people over with her personality.\n"
      - 'Attributes: name = Loch Fyne, eatType = restaurant, food = Fast food, priceRange = less than £20, familyFriendly = yes. Produce a detailed sentence about this restaurant.\n'
      - 'Please provide a detailed set of instructions for baking a cake, including all ingredients, measurements, specific steps required, and environmental factors that may affect the baking process. Additionally, please provide a visual representation of the process, such as a diagram or flowchart, demonstrating the necessary order of operations, potential decision points, or alternate paths based on changes in environment. The final product should be a comprehensive guide that could be easily followed by someone with little to no baking experience, while also accounting for potential environmental variables that may impact the baking outcome.\n'
      - '{ { plot } } In 1964 , in the peak of Beatlemania , a reluctant John Lennon is persuaded by manager Brian Epstein to meet Freddie Lennon , the father who abandoned him seventeen years earlier , with the press in attendance .  When they meet , John accuses his father of abandoning him , but his father says that `` he left it up to John .  John and Brian quickly leave the meeting .  The movie then jumps to 1967 , after Brian Epstein has died .  The Beatles are giving a press conference about their new film, Magical Mystery Tour .  John is skeptical about the film , but Paul ( ( ( Andrew Scott convinces him to go through with the idea .  John then invites his father to his mansion to live with him .  Freddie Lennon arrives and meets his grandson , Julian .  Sitting with his wife , John reads the criticism of Magical Mystery Tour , while comparing his wife to Brigitte Bardot , whom he says he will meet after he returns from India .  John finds a letter addressed to him , with the word `` Breathe  written on it .  Later , after finding his father in a neighbor s house , Freddie reveals that he has a 19 year old girlfriend named Pauline , with whom he wants to live .  Lennon accuses his father of leaving him again , and then leaves , after telling his father that he will not live with him anymore .  After meeting Maharishi Mahesh Yogi , the Beatles quickly return to London , and in a press conference they say they made a mistake when they trusted Maharishi .  The journalists are curious about the Beatles new business -- Apple Records . \n\nQuestion: "How many times in the story did Freddie Lennon leave or abandon John Lennon?"\n\nResponse: "6 times"\n\nDoes the response correctly answer the question?'
      - 'Is there a way to programmatically obtain the execution plan of a LINQ to SQL or ADO.NET Query in order to display it as debug information?\n'
      - "Incorporating effective utilization of verbal cues and visually demonstrating poses where camera visibility is limited, as well as implementing the use of dynamic verbs, can greatly enhance students' yoga practice, as recommended by the trainees.\n\nHow can the trainees suggest innovative methods to elevate students' yoga practice, beyond relying on visual cues and dynamic verbs?"
      - 'What is the chemical composition of a Barbie ?\n\nWhat kind of thing would answer this question?'
      - 'A coffee shop brews 10 coffee cups per hour on a weekday and x coffee cups in total over the weekend. If the coffee shop is open 5 hours a day every single day, 370 coffee cups are brewed in 1 week. What is the value of unknown variable x?'
      - 'Jedna věc už dneska vyšla, pomyslel si Morris.\n\nTranslate this to English?'
      - 'Instructions: A text is given in Gujarati. Translate it from the Gujarati language to the Marathi language. The translation must not omit or add information to the original sentence.\nInput: નવી દિલ્હી, 6 જુલાઈ, 2018 રાષ્ટ્રપતિ શ્રી રામનાથ કોવિંદ  અને 8 જુલાઈ, 2018ના રોજ ગોવાની યાત્રા પર જશે.\nOutput:'

  - source_model: /mnt/bn/vgraph/llm_common/hf_models/Yi34B_for_MOE/bagel-34b-v0.4
    positive_prompts:
      - "George wants to warm his hands quickly by rubbing them. Which skin surface will produce the most heat?"
      - 'An astronomer is studying two stars that are the same distance from Earth. Star X appears brighter than star Y. Which statement best explains this observation?'
      - 'BEGININPUT\nBEGINCONTEXT\nurl: https://www.biography.com/scientist/marie-curie\nENDCONTEXT\nMarie Curie was born on November 7, 1867, in Warsaw, Poland. She was a physicist and chemist who conducted pioneering research on radioactivity and won two Nobel Prizes. \n\nENDINPUT\nBEGININSTRUCTION\nWhat field did Marie Curie contribute to and when was she born?\nCite your source.\nENDINSTRUCTION'
      - "Compose an indie pop song that captures the essence of summer love. It should be light-hearted, fun, and full of references to beach trips, late-night parties, and fleeting romances." 
      - "What popular social media platform, launched in 2010, allows users to share photos and videos with followers and apply various filters to their content?"
      - 'أي جائزة ترشحت إلها أيما ستون؟'
      - 'شكد عدد القتلى بسبب الهجوم من غير ضباط الشرطة؟'
      - "What are the common techniques used in identifying a new species, and how can scientists accurately categorize it within the existing taxonomy system?"
      - "What physical characteristics are used to classify organisms in the kingdom Animalia, and how are they used to differentiate between different animal groups?"
      - "What distinguishing characteristics can be used to differentiate between two subspecies of the same species? Provide examples from the animal kingdom. "
      - "Identify the subspecies of the common garden snail (Helix aspersa) found in your local area and describe the physical characteristics that distinguish it from other subspecies of Helix aspersa."
      -  "What is the time-independent Schrödinger equation, and how can it be solved for a particle in an infinite square well potential?"
      - "What is the quantum mechanical solution to the one-dimensional infinite square well problem, given a potential wall of length `L`?"
      - "A potential barrier of height 5 eV and width 1 nm is given. Find the probability that a particle with energy 6 eV is able to penetrate the barrier using the principles of quantum tunneling."
      - 'Write an article based on this "A man has been charged with murder and attempted murder after a woman and the man she was on a date with were stabbed at a restaurant in Sydney, Australia.'
      - 'Solve this riddle: "I am seen both in art and nature, appearing in shells and flowers alike. I grow by a specific sequence, where each number is the sum of the two before. What am I?"'
      - 'Calculate the probability of drawing two red cards in a row from a standard deck of playing cards without replacement.\t\n'
      - 'What is the name of the famous Greek philosopher who was a student of Socrates and teacher to Aristotle?'
      - 'If a farmer has 3 hens and each hen lays 2 eggs per day, how many eggs will the farmer have after one week?'
xiaojiangzhang commented 7 months ago

Hello, have you identified the reason for the poor performance of SFT MOE? I also encountered the same problem

oymzysmwe224 commented 7 months ago

Hello, have you identified the reason for the poor performance of SFT MOE? I also encountered the same problem

@xiaojiangzhang I have not found the reason. I am currently working on continuing the pretraining of the merged MoE model. I suspect that the merged model follows a prompt-wise gating strategy after the merge process, while the training process is based on a token-wise gating strategy. Therefore, I am attempting to incorporate more data to help the model adapt to the token-wise strategy.