CampusGPT-ai / GAI_Academic_Advising_Assistant_MVP

Open Source Public Repo for collaborative academic advising assistant project
MIT License
1 stars 0 forks source link

Search Relevancy #40

Open Zhongzhou opened 4 months ago

Zhongzhou commented 4 months ago

image The Chemistry department webpage should come up first, but the three pages references (not sure why the 2nd and 3rd have no names) were far from relevant.

Zhongzhou commented 4 months ago

image Same thing for physics, doesn't search for departmental webpages.

Zhongzhou commented 4 months ago

image

image

It really has a lot of difficulty finding specific information on individual departments, which leads me to suspect that the weights of the departmental webpages are too small for some reason.

marycampus commented 4 months ago

This is an interesting one - I cannot get the retriever to return specific information about the chemistry department office, and I think it's because the information is listed on the front page, but there is no context on the page that indicates this is an office location, it's just in a side bar, and I think users are supposed to intuit that that location is the office - semantically, it's not linking. I will think about ways to handle this in the search.

marycampus commented 4 months ago

@martilar might be a good one for you to noodle on as well. If you go into the search index and query for the office location, nothing is returned. If you go to the website directly, you can see the office listed on the main site for the chemistry department.

Zhongzhou commented 4 months ago

image

it is actually more than the front office. For example, this query should definitely be answered by looking at the faculty page or the facutly research pages from the physics department, but the chatBot returned three pages from the college. I've tried a number of questions and never got any page to show up from the physics department website.

marycampus commented 4 months ago

thank you for the additional context Chen

marycampus commented 4 months ago

PS -- i also created a but for the duplicate citations, that should not be happening

Zhongzhou commented 4 months ago

image The correct answer to this question is directly on the physics department homepage: https://sciences.ucf.edu/physics/ but the chatbot decided to make it up and pulled three unrelated pages up. I seriously doubt if the physics department pages are in the data dump.

Zhongzhou commented 4 months ago

PS -- i also created a but for the duplicate citations, that should not be happening

Those are actually three different pages, two are about funding, and one is the entrance page. They just have the same name.

marycampus commented 4 months ago

noted

marycampus commented 4 months ago

renaming: Search Relevancy.

Zhongzhou commented 3 months ago

image

It is generating the right response, but the first round didn't return the relevant webpage, and the second round it returned the relevant webpage, but the page is info2.txt, and info1.txt is not relevant.