AutoSurveys / AutoSurvey

159 stars 8 forks source link

Possible bug in `chunking` function in `src/agents/outline_writer.py` #17

Open Fuyujia799 opened 3 days ago

Fuyujia799 commented 3 days ago

Hi, I noticed a potential bug in the chunking function inside src/agents/outline_writer.py.

Here is the relevant part of the code:

def chunking(self, papers, titles, chunk_size=14000):
    paper_chunks, title_chunks = [], []
    total_length = self.token_counter.num_tokens_from_list_string(papers)
    num_of_chunks = int(total_length / chunk_size) + 1
    avg_len = int(total_length / num_of_chunks) + 1
    split_points = []
    l = 0
    for j in range(len(papers)):
        l += self.token_counter.num_tokens_from_string(papers[j])
        if l > avg_len:
            l = 0
            split_points.append(j)
            continue
    start = 0
    for point in split_points:
        paper_chunks.append(papers[start:point])
        title_chunks.append(titles[start:point])
        start = point
    paper_chunks.append(papers[start:])
    title_chunks.append(papers[start:])
    return paper_chunks, title_chunks

In the second-to-last line:

title_chunks.append(papers[start:])

I think it should be:

title_chunks.append(titles[start:])

Otherwise, the title_chunks list seems to end up containing chunks from papers instead of titles, which might be incorrect.

Please verify this and let me know if it needs to be fixed.

Thanks!

GuoQi2000 commented 1 day ago

Thanks for pointing it out! We've fixed the bug now :)