RUC-NLPIR / FlashRAG

⚡FlashRAG: A Python Toolkit for Efficient RAG Research
https://arxiv.org/abs/2405.13576
MIT License
891 stars 69 forks source link

可以提供examples下simple_pipeline.py的结果吗 #9

Closed cbxgss closed 1 month ago

cbxgss commented 1 month ago

我运行examples的检索结果不佳,不知道是什么问题

下面是使用e5的结果

    {
        "id": "test_0",
        "question": "who got the first nobel prize in physics",
        "golden_answers": [
            "Wilhelm Conrad R\u00f6ntgen"
        ],
        "output": {
            "retrieval_result": [
                {
                    "id": "629",
                    "contents": "\"Vern Oliver Knudsen\"\nand during which time the UCLA Graduate Division increased from 287 to 5160. He served as Chancellor of UCLA from 1959-1960, where there is a building named in his honor. Vern Oliver Knudsen Vern Oliver Knudsen (December 27, 1893 \u2013 May 13, 1974) was an American acoustical physicist. Knudsen received his bachelor's degree from Brigham Young University (BYU) with an A.B. in 1915. Following his graduation from BYU Knudsen served a mission for The Church of Jesus Christ of Latter-day Saints from 1915-1918 in the Northern States Mission, which was headquartered in Chicago. Knudsen then joined the staff of Bell",
                    "score": 0.7241127490997314
                },
                {
                    "id": "688",
                    "contents": "\"Thomas Shaw Brandreth\"\nThomas Shaw Brandreth Thomas Shaw Brandreth, FRS (24 July 1788 \u2013 27 May 1873) was an English mathematician, inventor and classicist. Brandreth was the son of a Cheshire physician, Joseph Brandreth. He studied at Eton and received a BA from Trinity College, Cambridge in 1810 as Second Wrangler, second Smith's Prizeman, and chancellor's medalist, attesting to his keen intelligence. He received his MA in 1813, and was subsequently elected to a fellowship at Trinity and called to the bar. He entered legal practice at Liverpool, but was much diverted from advancement by his interest in inventions. Elected a Fellow of",
                    "score": 0.7126446962356567
                },
                {
                    "id": "313",
                    "contents": "\"Henry Tizard\"\nmathematics and chemistry, doing work on indicators and the motions of ions in gases. Tizard graduated in 1908 and at his tutor's suggestion he spent time in Berlin, where he met and formed a close friendship with Frederick Alexander Lindemann, later an influential scientific advisor of Winston Churchill. In 1909, he became a researcher in the Davy\u2013Faraday Laboratory of the Royal Institution, working on colour change indicators. In 1911, Tizard returned to Oxford as a tutorial fellow at Oriel College and to work as a demonstrator in the electrical laboratory. Tizard was married on 24 April 1915 to Kathleen Eleanor",
                    "score": 0.7001247406005859
                },
                {
                    "id": "853",
                    "contents": "\"Keigo Abe\"\nKeigo Abe Abe was born on October 28, 1938, in the town of Iyoshi, Ehime Prefecture (on the island of Shikoku), Japan. Abe's ancestors included samurai; he has told of an incident from the past where two thieves had entered his family home, saying, \"\"My family killed them; the two graves are still standing near my house.\"\" In 1953, aged 15 years, he began training in the martial arts of karate and judo. He initially began studying the Shito-ryu style of karate. In 1956, Abe entered the Nihon University in Tokyo, studying engineering, and graduated four years later. He began",
                    "score": 0.6999887228012085
                },
                {
                    "id": "79",
                    "contents": "\"Ernst G. Bauer\"\npioneering contributions to the most aspects of surface science since its inception. Considerable understanding of surfaces and thin films has been achieved by laterally averaging measurement techniques, but it has become evident that many problems can be solved only by laterally resolving methods (surface microscopy). Ernst Bauer invented Low Energy Electron Microscopy (LEEM) already in 1962 but he had to overcome intense skepticism of scientists and also many scientific and funding obstacles before finally LEEM came to fruition in 1985. His work was brought to the attention of a much wider general scientific community in the nineteen eighties, when the",
                    "score": 0.6998158693313599
                }
            ],
            "prompt": "<|system|>\nAnswer the question based on the given document.                     Only give me the answer and do not output any other words.                     \nThe following are given documents.\n\nDoc 1(Title: \"Vern Oliver Knudsen\") and during which time the UCLA Graduate Division increased from 287 to 5160. He served as Chancellor of UCLA from 1959-1960, where there is a building named in his honor. Vern Oliver Knudsen Vern Oliver Knudsen (December 27, 1893 \u2013 May 13, 1974) was an American acoustical physicist. Knudsen received his bachelor's degree from Brigham Young University (BYU) with an A.B. in 1915. Following his graduation from BYU Knudsen served a mission for The Church of Jesus Christ of Latter-day Saints from 1915-1918 in the Northern States Mission, which was headquartered in Chicago. Knudsen then joined the staff of Bell\nDoc 2(Title: \"Thomas Shaw Brandreth\") Thomas Shaw Brandreth Thomas Shaw Brandreth, FRS (24 July 1788 \u2013 27 May 1873) was an English mathematician, inventor and classicist. Brandreth was the son of a Cheshire physician, Joseph Brandreth. He studied at Eton and received a BA from Trinity College, Cambridge in 1810 as Second Wrangler, second Smith's Prizeman, and chancellor's medalist, attesting to his keen intelligence. He received his MA in 1813, and was subsequently elected to a fellowship at Trinity and called to the bar. He entered legal practice at Liverpool, but was much diverted from advancement by his interest in inventions. Elected a Fellow of\nDoc 3(Title: \"Henry Tizard\") mathematics and chemistry, doing work on indicators and the motions of ions in gases. Tizard graduated in 1908 and at his tutor's suggestion he spent time in Berlin, where he met and formed a close friendship with Frederick Alexander Lindemann, later an influential scientific advisor of Winston Churchill. In 1909, he became a researcher in the Davy\u2013Faraday Laboratory of the Royal Institution, working on colour change indicators. In 1911, Tizard returned to Oxford as a tutorial fellow at Oriel College and to work as a demonstrator in the electrical laboratory. Tizard was married on 24 April 1915 to Kathleen Eleanor\nDoc 4(Title: \"Keigo Abe\") Keigo Abe Abe was born on October 28, 1938, in the town of Iyoshi, Ehime Prefecture (on the island of Shikoku), Japan. Abe's ancestors included samurai; he has told of an incident from the past where two thieves had entered his family home, saying, \"\"My family killed them; the two graves are still standing near my house.\"\" In 1953, aged 15 years, he began training in the martial arts of karate and judo. He initially began studying the Shito-ryu style of karate. In 1956, Abe entered the Nihon University in Tokyo, studying engineering, and graduated four years later. He began\nDoc 5(Title: \"Ernst G. Bauer\") pioneering contributions to the most aspects of surface science since its inception. Considerable understanding of surfaces and thin films has been achieved by laterally averaging measurement techniques, but it has become evident that many problems can be solved only by laterally resolving methods (surface microscopy). Ernst Bauer invented Low Energy Electron Microscopy (LEEM) already in 1962 but he had to overcome intense skepticism of scientists and also many scientific and funding obstacles before finally LEEM came to fruition in 1985. His work was brought to the attention of a much wider general scientific community in the nineteen eighties, when the\n</s>\n<|user|>\nQuestion: who got the first nobel prize in physics\nAnswer:</s>\n<|assistant|>\n",
            "pred": "The first Nobel Prize in Physics was awarded to three scientists in 1903: William Ramsay, Morris Travers, and Frederick Soddy. Ramsay won the prize for his work on the structure of the atom, Travers for his work on the chemistry of the elements, and",
            "metric_score": {
                "em": 0.0,
                "f1": 0,
                "sub_em": 0.0,
                "precision": 0,
                "recall": 0
            }
        }
    },

以及使用bm25的结果

    {
        "id": "test_0",
        "question": "who got the first nobel prize in physics",
        "golden_answers": [
            "Wilhelm Conrad R\u00f6ntgen"
        ],
        "output": {
            "retrieval_result": [
                {
                    "id": "228",
                    "contents": "\"Elizabeth Sprague Coolidge\"\nwas a chemical physicist, political activist, and civil libertarian. In 1932, Coolidge established the Elizabeth Sprague Coolidge Medal for \"\"eminent services to chamber music.\"\" The medals were initially awarded by the Library of Congress. But, in 1949 \u2014 after objections by U.S. Congressmen over the appropriateness of a government body awarding prizes in fine arts and literature to individuals who might harbor dissident views towards the U.S. (re: Ezra Pound and the Bollingen Prize) \u2014 the Library of Congress discontinued awarding medals of any kind, including (i) the Bollingen Prize, the Elizabeth Sprague Coolidge Medal for \"\"eminent services to chamber",
                    "score": 4.6407999992370605
                },
                {
                    "id": "636",
                    "contents": "\"Karnatak Science College\"\nof Karnatak Science college, Dharwad. Presently, mulimani, Professor of forensics science, is the Principal. G.S. Paramasivaiah was a student of the Nobel-laureate Sir CV Raman was the first Principal of the Karnatak Science College.. He was also the Secretary of KLE Society Belgaum. He played an important role in founding of Karnatak University. In 1947, the Government of Bombay Province made attempts to establish a university in Bombay Karnataka. As per the resolution No.7914 of Education and Industries Department, the Government of Bombay, a committee was constituted on 17 April 1947 to make recommendations regarding form, scope, constitution and jurisdiction",
                    "score": 4.308499813079834
                },
                {
                    "id": "216",
                    "contents": "\"Jonathan Larson\"\nthe result of a three-year-long collaborative and editing process between Larson and the producers and director, was not publicly performed before Larson's death. The show premiered Off Broadway on schedule. Larson's parents (who were flying in for the show anyway) gave their blessing to open the show. Due to Larson's death the day before the first preview performance, the cast agreed that they would premiere the show by simply singing it through, all the while sitting at three prop tables lined up on stage. But by the time the show got to its high energy \"\"La Vie Boheme\"\", the cast",
                    "score": 4.280799865722656
                },
                {
                    "id": "89",
                    "contents": "\"Got My Mind Set on You\"\nGot My Mind Set on You \"\"Got My Mind Set on You\"\" (also written as \"\"(Got My Mind) Set on You\"\") is a song written and composed by Rudy Clark and originally recorded by James Ray in 1962, under the title \"\"I've Got My Mind Set on You\"\". An edited version of the song was released later in the year as a single on the Dynamic Sound label. In 1987, George Harrison released a cover version of the song as a single, and released it on his album \"\"Cloud Nine,\"\" which he had recorded on his own Dark Horse Records",
                    "score": 3.9467999935150146
                },
                {
                    "id": "97",
                    "contents": "\"Lido Vieri\"\nUEFA European Football Championship campaign on home soil under manager Ferruccio Valcareggi, and also at the 1970 FIFA World Cup, where Italy reached the final. A quick, physically strong, consistent, and dominant keeper, regarded as one of the best Italian goalkeepers of his generation, Vieri won the \"\"Premio Combi\"\" during the 1962\u201363 season, which was awarded to the best goalkeeper in Serie A. Throughout his career he made a name for himself as an aggressive, vocal, and commanding goalkeeper. Vieri was an elegant, courageous, and acrobatic shot-stopper, who was known in particular for his athleticism, spectacular saves, bravery, and ability",
                    "score": 3.6823999881744385
                }
            ],
            "prompt": "<|system|>\nAnswer the question based on the given document.                     Only give me the answer and do not output any other words.                     \nThe following are given documents.\n\nDoc 1(Title: \"Elizabeth Sprague Coolidge\") was a chemical physicist, political activist, and civil libertarian. In 1932, Coolidge established the Elizabeth Sprague Coolidge Medal for \"\"eminent services to chamber music.\"\" The medals were initially awarded by the Library of Congress. But, in 1949 \u2014 after objections by U.S. Congressmen over the appropriateness of a government body awarding prizes in fine arts and literature to individuals who might harbor dissident views towards the U.S. (re: Ezra Pound and the Bollingen Prize) \u2014 the Library of Congress discontinued awarding medals of any kind, including (i) the Bollingen Prize, the Elizabeth Sprague Coolidge Medal for \"\"eminent services to chamber\nDoc 2(Title: \"Karnatak Science College\") of Karnatak Science college, Dharwad. Presently, mulimani, Professor of forensics science, is the Principal. G.S. Paramasivaiah was a student of the Nobel-laureate Sir CV Raman was the first Principal of the Karnatak Science College.. He was also the Secretary of KLE Society Belgaum. He played an important role in founding of Karnatak University. In 1947, the Government of Bombay Province made attempts to establish a university in Bombay Karnataka. As per the resolution No.7914 of Education and Industries Department, the Government of Bombay, a committee was constituted on 17 April 1947 to make recommendations regarding form, scope, constitution and jurisdiction\nDoc 3(Title: \"Jonathan Larson\") the result of a three-year-long collaborative and editing process between Larson and the producers and director, was not publicly performed before Larson's death. The show premiered Off Broadway on schedule. Larson's parents (who were flying in for the show anyway) gave their blessing to open the show. Due to Larson's death the day before the first preview performance, the cast agreed that they would premiere the show by simply singing it through, all the while sitting at three prop tables lined up on stage. But by the time the show got to its high energy \"\"La Vie Boheme\"\", the cast\nDoc 4(Title: \"Got My Mind Set on You\") Got My Mind Set on You \"\"Got My Mind Set on You\"\" (also written as \"\"(Got My Mind) Set on You\"\") is a song written and composed by Rudy Clark and originally recorded by James Ray in 1962, under the title \"\"I've Got My Mind Set on You\"\". An edited version of the song was released later in the year as a single on the Dynamic Sound label. In 1987, George Harrison released a cover version of the song as a single, and released it on his album \"\"Cloud Nine,\"\" which he had recorded on his own Dark Horse Records\nDoc 5(Title: \"Lido Vieri\") UEFA European Football Championship campaign on home soil under manager Ferruccio Valcareggi, and also at the 1970 FIFA World Cup, where Italy reached the final. A quick, physically strong, consistent, and dominant keeper, regarded as one of the best Italian goalkeepers of his generation, Vieri won the \"\"Premio Combi\"\" during the 1962\u201363 season, which was awarded to the best goalkeeper in Serie A. Throughout his career he made a name for himself as an aggressive, vocal, and commanding goalkeeper. Vieri was an elegant, courageous, and acrobatic shot-stopper, who was known in particular for his athleticism, spectacular saves, bravery, and ability\n</s>\n<|user|>\nQuestion: who got the first nobel prize in physics\nAnswer:</s>\n<|assistant|>\n",
            "pred": "The first Nobel Prize in Physics was awarded to three scientists for their contributions to the development of the atomic bomb: James Chadwick, who discovered the neutron, Werner Heisenberg, who developed the uncertainty principle, and Max Planck, who proposed the concept of quantum mechanics.",
            "metric_score": {
                "em": 0.0,
                "f1": 0,
                "sub_em": 0.0,
                "precision": 0,
                "recall": 0
            }
        }
    },
ignorejjj commented 1 month ago

你好,我们提供的simple_pipeline只是为了便于验证整体流程是否跑通。里面提供的检索文档和index仅包含1000个文档,因此其实是无法返回需要的结果的。

如果想运行出正常的结果,需要使用完整wikipedia corpus以及对应的index。

cbxgss commented 1 month ago

你好,我们提供的simple_pipeline只是为了便于验证整体流程是否跑通。里面提供的检索文档和index仅包含1000个文档,因此其实是无法返回需要的结果的。

如果想运行出正常的结果,需要使用完整wikipedia corpus以及对应的index。

好的,十分感谢!

ZhexuanZhou commented 1 month ago

@cbxgss 想请教一下,这个bm25要怎么才能跑起来?我在config里面,将retrieval_method设置成bm25后,出现如下错误:nius.JavaException: JVM exception occurred: indexes/e5_flat_sample.index does not exist or is not a directory. java.lang.IllegalArgumentException

ignorejjj commented 1 month ago

@ZhexuanZhou 你好,目前flashrag的bm25基于pyserini实现。需要完成三个步骤:

  1. 安装pyserini (linux环境下可以直接使用pip安装)
  2. 下载构建好的bm25 索引或者使用我们提供的脚本来构建自己的bm25索引
  3. 将retrieval_method设置为bm25, 并将index路径设置为bm25索引的文件夹

如果遇到了其他问题,请随时在此issue中留言

cbxgss commented 1 month ago

@cbxgss 想请教一下,这个bm25要怎么才能跑起来?我在config里面,将retrieval_method设置成bm25后,出现如下错误:nius.JavaException: JVM exception occurred: indexes/e5_flat_sample.index does not exist or is not a directory. java.lang.IllegalArgumentException

@ZhexuanZhou 需要先build index,参考docs/下的文档

python -m flashrag.retriever.index_builder \
    --retrieval_method bm25 \
    --corpus_path examples/quick_start/indexes/sample_data.jsonl \
    --save_dir examples/quick_start/indexes-my/